In this post, I’ll briefly expand on the benefits of utilizing NSX as part of a disaster recovery (DR) solution. For additional information check out my prior multi-site and disaster recovery with NSX posts on the VMware Network Virtualization blog. Additionally, I recently presented at 2016 US VMworld and Europe VMworld on multi-site and disaster recovery solutions and recorded sessions can be viewed here: US VMworld, Europe VMworld.
Prior NSX Multi-site and Disaster Recovery Posts:
- Cross-VC NSX for Multi-site Solutions
- Enhanced Disaster Recovery with Cross-VC NSX and SRM
- NSX-V: Multi-site Options and Cross-VC NSX Design Guide
- Cross-VC NSX: Multi-site Deployments with Ease and Flexibility
- Multi-site with Cross-VC NSX: Consistent Security and Micro-segmentation Across Sites
- Multi-site with Cross-VC NSX and Palo Alto Networks Security
With disaster recovery, two challenges in general are:
- Recovering the application with the same IP address at the recovery site; this is important because typically there are other dependencies on this IP address such as possibly security, load balancer configs, DNS, application dependencies, etc.
- Ensuring security for the application is in place for the application upon disaster recovery; traditional solutions rely on manually updating or syncing security policies across the protected and recovery sites which is cumbersome and operationally challenging, time consuming, and also error-prone.
If we look at traditional DR solutions we’ll find they’re not holistic solutions. For example, to allow for maintaining an applications’ IP address and to support scenarios such as partial application failure, L2 extension is desired.
To achieve L2 extension traditional networking approaches such as simple L2 over dark fiber, VPLS over MPLS backbone, hardware based solutions such as OTV, etc., are leveraged. However, this does not address the security or automation aspects. These traditional methods are only focused on the network and per-device configuration and lack the automation and flexibility needed. Thus these traditional solutions are also complex and operationally challenging.
Leveraging NSX for multi-site solutions and specific use cases like disaster avoidance and disaster recovery, provide the following benefits in general:
- decoupling from physical infrastructure
- ease of deployment and use
- high degree of automation
- rapid application deployment/recovery and productivity
- extensive partner ecosystem for services
- integration with other DR & SDDC components (Site Recovery Manager (SRM), vSphere hypervisor, vRealize Suite, etc.)
More specifically, as shown in the below figure, when leveraging the Cross-VC NSX feature for multi-site and disaster recovery, we can easily have L2 extension over an L3 underlay across vCenter domains. These vCenter domains may also be across multiple sites.
DR orchestration tools such as SRM require two vCenters (one for each site) for additional segmentation. As such Cross-VC NSX, which allows for consistent logical networking and security across multiple vCenter domains/sites, is a perfect fit for the DR use case. This allows our DR orchestration tool, such as SRM, to recover the workloads at the recovery site while maintaining the IP address.
Further, Cross-VC NSX allows for central management of security policies at the protected site and ensures consistent security policies across the protected and recovery sites. In effect, we are also achieving an enhanced security model with micro-segmentation across sites. See the following prior post on the VMware Network Virtualization blog site: Multi-site with Cross-VC NSX: Consistent Security and Micro-segmentation Across Sites
In addition, SRM has integration with NSX where if Storage Policy Protection Groups (SPPGs) are used in SRM, automatic mapping can be done between networks at the protected site and recovery site as shown below. SPPGs allows automating and more easily protecting workloads by simply selecting a storage policy when deploying respective workload; the workload will automatically be protected and deployed on a protected datastore.
vSphere replication can also be used instead of array based replication and allows for selectively replicating on a per VM basis and also requires no dependency on having the same hardware vendor storage at both the the protected and recovery sites. A diagram of an example DR deployment leveraging Cross-VC NSX, 3rd party security services for advanced security (Palo Alto Networks in this example), vSphere replication, and SRM is shown below.
One of the key benefits of the NSX + SRM DR solution is how easily you can setup test networks and test recovery plans with no disruption to the production environment. NSX can be used to easily create test networks which SRM can utilize when testing recovery plans.
SRM can use the same logical networks to failover applications upon an actual disaster; however, for testing of recovery plans, SRM can be configured to use the test networks created by NSX. Using a dedicated test network to test recovery plans allows for testing in an isolated environment while maintaining the same application IP addresses and security policies at the recovery site; this can be done with no disruption to the production environment.
The below two figures show test universal logical switches and a test universal distributed logical router (DLR) created with NSX which will be used by SRM when testing recovery plans. Since seperate logical switches and DLR are used for testing, overlapping IPs with applications on the production network can exist. East-West connectivity and traffic flow can be tested here easily via test DLR. If North/South testing is also required, you will want to create a duplicate network for upstream as well as to not disrupt production traffic.
The figure below shows the configuration within SRM which shows the same production logical networks will be used at the recovery site upon disaster recovery, and, for testing of recovery plans, the test logical networks will be utilized.
Now when we run the test of the recovery plan within SRM, the NSX test logical networks will be utilized without any disruption to our production environment as shown in the figure below. We can quickly create as many test networks as needed with NSX for different application/tenant environments and easily test numerous recovery plans.
Trying to accomplish this without network virtualization would be time consuming and error-prone where configuration would involve new VLANs created on the physical network, VRFs utilized for overlapping IPs or remapping networks, routing updates, and updating and ensuring correct security policies are in place at the recovery site. With NSX, the entire process is greatly simplified and automated.
Additionally, within SRM, priorities and dependencies can be used as shown below to orchestrate the order of when workloads/apps should be recovered. For example, in the below figure priorities are used to ensure the DB tier is recovered first, second the App tier, and finally the Web tier; this is done to ensure that when the Web application is recovered, the required backend services are in place so the application does not throw errors and works correctly.
For a quick overview with walkthrough and demo of the NSX + SRM Disaster Recovery solution, please make sure to checkout the video at the top of this post. Additional information and hands-on-lab on multi-site and disaster recovery with NSX can be found at the below links.
- Prior multi-site and disaster recovery posts on the VMware Network Virtualization blog
- NSX-V Multi-site Options and Cross-VC NSX Design Guide
- Disaster Recovery with NSX and SRM
- HOL-1725-USE-2 – VMware NSX Multi-Site DR with SRM