Why should you test your DR plan?
I'm regularly surprised at the number of customers I talk to that don't routinely test their recovery plans. Testing a recovery plan goes a long way to reducing the risk of problems, increasing the comfort with and confidence in the recovery plan. For details about how the recovery plan testing process works with VMware Site Recovery for VMware Cloud on AWS, see this blog post.
Isolation is necessary
It is critical that when running a test of a recovery plan, test VMs are isolated from production VMs. Test VMs are exact copies of the production VMs and having duplicate VMs in an environment can cause and create all kinds of issues. VMware Site Recovery ensures that storage is isolated, and replication keeps going during testing. VMware Site Recovery does this by taking a snapshot of the replicated VM and using that for all testing. The use of snapshots ensures that there is no impact on RPO.
Options for isolation
There are a couple of options to keep test VMs network isolated from their production originals. VMware Site Recovery can automatically create an isolated network on each of the hosts at the VMware Cloud on AWS recovery site. The automatic creation of networks doesn't require any additional configuration.
However, VMs on different subnets or VMs recovered on different hosts will not be able to communicate with each other. The lack of VM to VM communication can limit the usefulness of this option, especially if there is a desire to test an application. The other option is that the customer can create routed test networks.
For customers that have decided to utilize routed test networks, as opposed to the auto-generated networks, this post will help them understand their options and choices for building networks that allow for testing the functionality of recovered applications in addition to testing recovery plans.
The challenge and solution
The challenge with routed test networking in VMware Cloud on AWS currently is that the compute gateway doesn't support multiple, separately routed networks. Additionally, for now, it isn't possible to deploy multiple compute gateways. We need alternative ways of routing traffic between isolated networks in VMware Cloud on AWS and the supported options we have are:
- Create L2 connectivity between the on-premises environment and the VMware Cloud on AWS SDDC using either NSX-T or HCX L2 network extensions and utilize the on-premises router to route traffic between them
- Create separate networks within VMware Cloud on AWS and use a router VM to connect them
Let's explore these options in more detail.
Layer 2 Extensions
There are currently two ways that customers can extend their on-premises L2 networks into VMware Cloud on AWS, VMware HCX Network Extension, and NSX Datacenter L2 VPN. Either of these options will work for creating a routed, isolated, recovery plan test environment within VMware Cloud on AWS, the important thing is the stretched L2 and those networks being routed at the protected/on-premises site.
The idea is that in normal operations, production VMs in the on-premises site either use local or stretched L2 networks for their regular operations. When a recovery plan test is run, we need a routed and isolated test network that is as close as possible in function to the production network. By creating those test networks, including routing as needed to duplicate the production networks, in the protected/on-premises site, and extending those networks to the VMware Cloud on AWS SDDC, we get our isolated and routed test network. This test network works even though the routing is happening at the on-premises site as it still provides the ability to test application functionality with the VMs running in VMware Cloud on AWS.
In the event of an actual failover, the recovered VMs come up on the production networks within the VMware Cloud on AWS SDDC and there is no dependence on the on-premises site.
The second option for testing recovery and keeping test network traffic isolated is to use a router VM combined with isolated networks to keep the network traffic separate. There are many options when it comes to routing appliances. Everything from purpose-built appliances to standard Windows and Linux can support routing. The choice of the tool depends on the requirements.
The router VM can either be a VM running in VMware Cloud on AWS or part of the recovery plan. The router VM is used only during testing. Using isolated networks as test networks, the router VM is connected to all of them and ensures traffic is routed between them. (see diagram).
Testing is an incredibly important part of any disaster recovery plan. There are now multiple supported options for customers to use when they want to test their application functionality as part of testing their recovery plans in VMware Cloud on AWS. I look forward to hearing what customers do with these new capabilities!