By Dharma Rajan
VMware Network Virtualization Platform (NSX) is the network virtualization platform for the software-defined datacenter (SDDC). Network virtualization using VMware NSX enables virtual networks to be created as software entities, saved and restored, and deleted on demand without requiring any reconfiguration of the physical network. Logical network entities like logical switch, logical routers, security objects, logical load balancers, distributed firewall rules and service composer rules are created as part of virtualizing the network.
To provide continuity of service from disaster recovery (DR), datacenters are built with capabilities for replicating and recovering workloads between protected and recovery sites. VMware Site Recovery Manager (SRM) helps to fully automate the recovery process.
From a DR point the recovery site has to be in synch with the protected site at all times from a compute, storage and networking point of view to enable seamless fast recovery when the protected site fails due to a disaster. When using SRM today for DR there are a couple of challenges customers face. From a compute perspective one needs to prepare the host at the recovery site, pre-allocate compute capacity for placeholder virtual machines and create placeholder virtual machines themselves.
From a storage point, the storage for protected applications/virtual machines needs to be replicated and kept in synch. Both of these steps are easy and has been handled by SRM-, vSphere- and/or Array-based replication. The challenge today is the networking piece of the puzzle. As illustrated below, depending on the type of networking established between protected and recovery site, various networking changes (carve out Layer-2, Layer-3, Firewall, Load balancer policy in recovery site, re-map of network if IP address space overlap, recreate policies, etc.) may have to be manually done to ensure smooth recovery. This adds a lot of time, subject to human error in making the changes, inability to meet internal and external SLA. The result of this is the network is the bottleneck that prevents seamless disaster recovery. From a business perspective this can easily translate into millions of dollars in business loss based on criticality of workloads/services impacted.
Why Are We Running into the Networking Challenge?
The traditional DR solution is tied tightly to physical infrastructure (physical routers, switches, firewalls, load balancers). The security domains of the protected and recovery sites are completely separate. As networking changes, be it new adds, delete, updates are made (say IP address, Layer-2 extension changes, subnets, etc.) at the protected site, no corresponding automated synchronization happens at the recovery site. Thus one may have to do Layer-2 extension to preserve the changes, create and maintain special scripts, manage the tools, and perform manual DR setup and recovery steps across different infrastructure layers and vendors (physical and virtual). From a process point it requires coordination across various teams within your company, good bookkeeping and periodic validation, so you are always ready to address a DR scenario as quickly as you can.
What is the Solution?
VMware NSX from release 6.2 offers a solution that enables customers to address the above-cited networking challenges. NSX is the network virtualization platform for the SDDC. NSX provides the basic foundation to virtualize networking components in the form of logical switching, distributed logical router, distributed logical firewall, logical load balancer, and logical edge gateways. For a deeper understanding of NSX see more at: http://www.vmware.com/products/nsx
NSX 6.2 release has been integrated with SRM 6.1 to enable automated replication of networking entities between protected and recovery sites.
How Does the Solution Work?
NSX 6.2 supports a couple of key concepts that will intelligently understand that it is logically the same network on both sites. These concepts include:
- a) “Universal Logical Switches” (ULS) – This allows for the creation of Layer-2 networks that span vCenter boundaries. This means that when utilizing ULS with NSX there will be a virtual port group at both the protected and recovery site that connect to the same Layer-2 network. When virtual machines are connected to port groups that are backed by ULS, SRM implicitly creates a network mapping, without requiring the admin to configure it. Providing seamless network services portability and synchronization automatically reconnects virtual machines connected to a ULS to the same logical switch on the other vCenter.
NSX 6.2 ULS Integration with SRM 6.1 Automatic Network Mapping
- b) Cross vCenter Networking and Security enables key use cases such as:
- Resource pooling, virtual machine mobility, multi-site and disaster recovery
- Cross-vCenter NSX eliminates the need for guest customization of IP addresses
and management of portgroup mappings, two large SRM pain points today
- Centralized management of universal objects, reducing administration effort
- Increased mobility of workloads; virtual machines can be “vMotioned” across vCenter Servers without having to reconfigure the virtual machine or making changes to firewall rules
The deployment process would ideally be to:
- Configure Master NSX Manager at primary site and Secondary NSX Manager at recovery site
- Configure Universal Distributed Logical Router between primary and secondary site
- Deploy Universal Logical Switch between primary and recovery site and connect it to Universal Distributed Logical Router
- Deploy the VRO plugin for automation and monitoring
- Finally map SRM network resources between primary and recovery sites
Supported Use Cases and Deployment Architectures
The primary use cases are full site disaster recovery scenarios or unplanned outage where the primary site can go down due to a disaster and secondary site takes immediate control and enables business continuity. The other key use case is planned datacenter migration scenarios where one could migrate workloads from one site to another maintaining the underlying networking and security profiles. The main difference between the two use cases is the frequency of the synchronization runs. In a datacenter migration use case you can take one datacenter running NSX and reproduce the entire networking configuration on the DR side in a single run of the synchronization workflow or run it once initially and then a second time to incrementally update the NSX objects before cutover.
Other supported use cases include partial site outages, preventive failover, or when you anticipate a potential datacenter outage, for example, impending events like hurricanes, floods, forced evacuation, etc.
The standard 1:1 deployment model with one site as primary and another as secondary is the most common deployed model. In a shared recovery site configuration, like for branch offices, you install one SRM server instance and NSX on each protected site. On the recovery site, you install multiple SRM Server instances to pair with each SRM server instance on the protected sites. All of the SRM server instances on the shared recovery site connect to the same vCenter server and NSX instance. You can consider the owner of an SRM server pair to be a customer of the shared recovery site. You can use either array-based replication or vSphere replication or a combination of both when you configure an SRM server to use a shared recovery site.
Logical Disaster Recovery Architecture Using NSX Universal Objects
What Deployment Architecture Will the Solution Support?
This solution applies to all Greenfield and Brownfield environments. The solution will need the infrastructure to be base-lined to vCenter 6.0 or later, ESXi 6.0 or later, vSphere Distributed switch, SRM 6.0 or later with NSX 6.2 or later.
SRM can be used for different failover scenarios. It could be Active-Active, Active-Passive, Bidirectional, and Shared Recovery.
Integrated Solution Advantages
The ability to automate the disaster recovery planning, maintenance and testing process becomes much simpler, with automation enabling significant operational efficiencies.
- The ability to create a network that spans vCenter boundaries creates a cross-site Layer-2 network, which means that after failover, it is no longer necessary to re-configure IP addresses. Not having to re-IP recovered virtual machines can further reduce recovery time by up to 40 percent.
- There is more automation with networking and security objects. Logical switching, logical routing, security policies (such as security groups), firewall settings and edge configurations are also preserved on recovered virtual machines, further decreasing the need for manual configurations post-recovery.
- Making an isolated test network with all the same capabilities identical to a production environment becomes much easier.
In conclusion, the integration of NSX and SRM greatly simplifies operations, lowers operational expenses, increases testing capabilities and reduces recovery times.
For more information on NSX visit: http://www.vmware.com/products/nsx/
For more information on SRM visit: http://www.vmware.com/products/site-recovery-manager/
For more information on VMware Professional Services visit: http://www.vmware.com/consulting/
About the Author:
Dharma Rajan is a Solution Architect in the Professional Services Organization specializing in pre-sales for SDDC and driving NSX technology solutions to the field. His experience spans Enterprise and Carrier Networks. He holds an MS degree in Computer Engineering from NCSU and M.Tech degree in CAD from IIT