In a meeting
Multi-Cloud

How VMware IT Achieves Nondisruptive Disaster Recovery

by: VMware IT DR Senior Manager Lalit Parashari and VMware Lead Cloud Infrastructure Administrator Mohammed Kajamoideen

Many enterprises struggle to meet disaster recovery (DR) requirements due to the cost, effort and downtime required for implementation and testing. Scheduling the downtime to run disaster recovery tests on business-critical applications is the primary challenge. To meet this challenge, VMware IT developed a comprehensive approach to allow nondisruptive DR testing of entire production applications, including end-to-end application validation in an isolated environment—without downtime. 

Network communication during test recovery 

VMware IT uses the automation and orchestration tool VMware Site Recovery Manager™ to automate DR workflows. Site Recovery Manager includes the option to perform a dry run (test recovery) of recovery plans to validate that all infrastructure components operate as expected.  

The test recovery option in Site Recovery Manager creates and places a copy of the virtual machine (VM) in the DR site instead of powering off VMs in the protected site. During test recovery, Site Reovery Manager creates a port group (bubble network) on each of the DR VMware ESXi  hosts without any network uplink. The bubble network port group connects to failed-over VMs in the DR site, ensuring that during  test recovery, traffic from virtual machines (VMs) does not pass across the ESXi host and avoids a network conflict with actual production VMs. 

While we can validate the infrastructure components with the default Site Recovery Manager test network, we cannot validate the application functionality because the multitier applications are deployed across multiple VMware vSphere® clusters and VMware vCenter® servers. Network communication with the VMs deployed across the ESXi host and clusters is not possible with the default bubble network.  

Instead, we used VMware NSX-T Data Center to enable seamless network communication between multitier applications deployed across VMware vSphere clusters. NSX-T Data Center creates logical segments to replicate the production VLANs and specifies the segments as the Site Recovery Manager test recovery network instead of using default test network.  

NSX-T Data Center uplinks the logical segments to the T1 logical router that provides routing across the logical segments without external connectivity to the corporate network. Traffic from the VMs is restricted to the T1 logical router and routes across multiple segments within the T1 logical router, which allows us to establish network communication across VMs failed over during test recovery. NSX-T Data Center simplifies the network communication of VMs across multiple vSphere clusters without disturbing the actual production VMs. 

Application validation in an isolated environment 

Disaster recovery tests are not completed until we validate the application functionality. During test recovery, VMs are connected to the isolated bubble environment, and we use NSX-T Data Center to enable network communication across VMs. 

Since the bubble environment is completely isolated from the corporate network, VMware IT uses VMware Horizon® and VMware Horizon Unified Access Gateway to provide seamless user access to validate the applications during test recovery. The Horizon Unified Access Gateway functions as a proxy between the isolated environment and corporate network. VMware Horizon servers and desktop pools are deployed inside the bubble network. While logged into the corporate network, users type the Horizon URL into a browser and log in with their Active Directory credentials. VMware Horizon Unified Access Gateway routes the user request to a VMware Horizon desktop allocated by the VMware Horizon server. 

Figure 1 

Since the VMware Horizon desktops are deployed inside the bubble network, users have access to the entire bubble network via logical segments, allowing them to perform end-to-end validation of the applications within the isolated environment. 

Figure 2 

Benefits of nondisruptive DR  

  • Flexible and nonintrusive: Perform bubble DR anytime without downtime. 
  • End-to-end validation: Complete end-to-end application and business validation without disturbing production VMs. 
  • Environment upgrade validation: Use isolated bubble environment to validate a business-critical system upgrade before launching into the production environment. 
  • Increased confidence of readiness: Run nondisruptive DR to validate environments after applying significant updates to the infrastructure increasing confidence in our ability to face the disasters.  

VMware IT achieved a nondisruptive disaster recovery (DR) test failover of numerous production applications deployed across multiple VMware vCenter servers using VMware Site Recovery Manager, VMware NSX-T Data Center and VMware Horizon. Using these products for end-to-end validation makes our nondisruptive DR possible without impacting production applications. 

VMware on VMware blogs are written by IT subject matter experts sharing stories about our digital transformation using VMware products and services in a global production environment. Contact your sales rep or [email protected] to schedule a briefing on this topic. Visit the VMware on VMware microsite and follow us on Twitter.