Home > Blogs > VMware Consulting Blog > Tag Archives: DR

Tag Archives: DR

BCDR: Some Things to Consider When Upgrading Your VMware Disaster Recovery Solution

Julienne_PhamBy Julienne Pham

Once upon a time, you protected your VMs with VMware Site Recovery Manager, and now you are wondering how to upgrade your DR solution with minimum impact on the environment. Is it as seamless as you think?

During my days in Global Support and working on customer Business Continuity/Disaster Recovery (BCDR) projects, I found it intriguing how vSphere components can put barriers in an upgrade path. Indeed, one of the first things I learned was that timing and the update sequence of my DR infrastructure was crucial to keep everything running, and with as little disruption as possible.

Here If we look more closely, this is a typical VMware Site Recovery Manager setup:

JPham_SRM 6x

And in a pyramid model, we have something like this:

JPham_SRM Pyramid

Example of a protected site

So, where do we start our upgrade?

Upgrade and maintain the foundation

You begin with the hardware. Then, the vSphere version you are upgrading to. You’ll see a lot of new features available, along with bug fixes, so your hardware and firmware might need some adjustments to support new features and enhancements. It is important at a minimum to check the compatibility of the hardware and software you are upgrading to.

In a DR scenario, it is important to check storage replication compliance

This is where you ensure your data replicates according to your RPO.

If you are using vSphere Replication or Storage Array Replication, you should check the upgrade path and the dependency with vSphere and SRM.

  • As an example, VR cannot be upgraded directly from 5.8 to 6.1
  • You might need to update the Storage Replication Adaptor too.
  • You can probably find other examples of things that won’t work, or find work-arounds you’ll need.
  • You can find some useful information in the VMware Compatibility Guide

Architecture change

If you are looking to upgrade from vSphere 5.5 to 6.1, for example, you should check if you need to migrate from a simple SSO install to an external one for more flexibility, as you might not be able to change in the future. As VMware SRM is dependent on the health of vCenter, you might be better off looking first into upgrading this component as a prerequisite.

Before you start you might want to check out the informative blog, “vSphere Datacenter Design – vCenter Architecture Changes in vSphere 6.0 – Part 1.”

The sites are interdependent

Once the foundation path is planned out, you have to think about how to minimize business impact.

Remember that if your protected site workload is down, you can always trigger a DR scenario, so it is in your best interest to keep the secondary site management layer fully functional and upgrade VMware SRM and vCenter at the last resort.

VMware upgrade path compatibility

Some might assume that you can upgrade from one version to another without compatibility issues coming up. Well, to avoid surprises, I recommend looking into our compatibility matrix, and validate the different product version upgrade paths.

For example, the upgrade of SRM 5.8 to 6.1 is not supported. So, what implications might be related to vCenter and SRM compatibility during the upgrade?

JPham_Upgrade Path Sequence

Back up, back up, back up

The standard consideration is to run backups before every upgrade. A snapshot VM might not be enough in certain situations if you are in different upgrade stages at different sites. You need to carefully plan and synchronise all different database instances for VMware Site Recovery Manager and vCenter—at both sites and eventually vSphere Replication databases.

I hope this addresses some of the common questions and concerns that might come up when you are thinking of upgrading SRM. Planning and timing are key for a successful upgrade. Many components are interdependent, and you need to consider them carefully to avoid an asynchronous environment with little control over outcomes. Good luck!


Julienne Pham is a Technical Solution Architect for the Professional Services Engineering team. She is specialised on SRM and core storage. Her focus is on VIO and BCDR space.

How NSX Simplifies and Enables True Disaster Recovery with Site Recovery Manager

Dharma RajanBy Dharma Rajan

VMware Network Virtualization Platform (NSX) is the network virtualization platform for the software-defined datacenter (SDDC). Network virtualization using VMware NSX enables virtual networks to be created as software entities, saved and restored, and deleted on demand without requiring any reconfiguration of the physical network. Logical network entities like logical switch, logical routers, security objects, logical load balancers, distributed firewall rules and service composer rules are created as part of virtualizing the network.

To provide continuity of service from disaster recovery (DR), datacenters are built with capabilities for replicating and recovering workloads between protected and recovery sites. VMware Site Recovery Manager (SRM) helps to fully automate the recovery process.

From a DR point the recovery site has to be in synch with the protected site at all times from a compute, storage and networking point of view to enable seamless fast recovery when the protected site fails due to a disaster. When using SRM today for DR there are a couple of challenges customers face. From a compute perspective one needs to prepare the host at the recovery site, pre-allocate compute capacity for placeholder virtual machines and create placeholder virtual machines themselves.

From a storage point, the storage for protected applications/virtual machines needs to be replicated and kept in synch. Both of these steps are easy and has been handled by SRM-, vSphere- and/or Array-based replication. The challenge today is the networking piece of the puzzle. As illustrated below, depending on the type of networking established between protected and recovery site, various networking changes (carve out Layer-2, Layer-3, Firewall, Load balancer policy in recovery site, re-map of network if IP address space overlap, recreate policies, etc.) may have to be manually done to ensure smooth recovery. This adds a lot of time, subject to human error in making the changes, inability to meet internal and external SLA. The result of this is the network is the bottleneck that prevents seamless disaster recovery. From a business perspective this can easily translate into millions of dollars in business loss based on criticality of workloads/services impacted.

DRajan 1

Why Are We Running into the Networking Challenge?

The traditional DR solution is tied tightly to physical infrastructure (physical routers, switches, firewalls, load balancers). The security domains of the protected and recovery sites are completely separate. As networking changes, be it new adds, delete, updates are made (say IP address, Layer-2 extension changes, subnets, etc.) at the protected site, no corresponding automated synchronization happens at the recovery site. Thus one may have to do Layer-2 extension to preserve the changes, create and maintain special scripts, manage the tools, and perform manual DR setup and recovery steps across different infrastructure layers and vendors (physical and virtual). From a process point it requires coordination across various teams within your company, good bookkeeping and periodic validation, so you are always ready to address a DR scenario as quickly as you can.

What is the Solution?

VMware NSX from release 6.2 offers a solution that enables customers to address the above-cited networking challenges. NSX is the network virtualization platform for the SDDC. NSX provides the basic foundation to virtualize networking components in the form of logical switching, distributed logical router, distributed logical firewall, logical load balancer, and logical edge gateways. For a deeper understanding of NSX see more at: http://www.vmware.com/products/nsx

NSX 6.2 release has been integrated with SRM 6.1 to enable automated replication of networking entities between protected and recovery sites.

DRajan 2

How Does the Solution Work?

NSX 6.2 supports a couple of key concepts that will intelligently understand that it is logically the same network on both sites. These concepts include:

  1. a) “Universal Logical Switches” (ULS) – This allows for the creation of Layer-2 networks that span vCenter boundaries. This means that when utilizing ULS with NSX there will be a virtual port group at both the protected and recovery site that connect to the same Layer-2 network. When virtual machines are connected to port groups that are backed by ULS, SRM implicitly creates a network mapping, without requiring the admin to configure it. Providing seamless network services portability and synchronization automatically reconnects virtual machines connected to a ULS to the same logical switch on the other vCenter.

DRajan 3

NSX 6.2 ULS Integration with SRM 6.1 Automatic Network Mapping

  1. b) Cross vCenter Networking and Security enables key use cases such as:
  • Resource pooling, virtual machine mobility, multi-site and disaster recovery
  • Cross-vCenter NSX eliminates the need for guest customization of IP addresses

and management of portgroup mappings, two large SRM pain points today

  • Centralized management of universal objects, reducing administration effort
  • Increased mobility of workloads; virtual machines can be “vMotioned” across vCenter Servers without having to reconfigure the virtual machine or making changes to firewall rules

The deployment process would ideally be to:

  • Configure Master NSX Manager at primary site and Secondary NSX Manager at recovery site
  • Configure Universal Distributed Logical Router between primary and secondary site
  • Deploy Universal Logical Switch between primary and recovery site and connect it to Universal Distributed Logical Router
  • Deploy the VRO plugin for automation and monitoring
  • Finally map SRM network resources between primary and recovery sites

Supported Use Cases and Deployment Architectures

The primary use cases are full site disaster recovery scenarios or unplanned outage where the primary site can go down due to a disaster and secondary site takes immediate control and enables business continuity. The other key use case is planned datacenter migration scenarios where one could migrate workloads from one site to another maintaining the underlying networking and security profiles. The main difference between the two use cases is the frequency of the synchronization runs. In a datacenter migration use case you can take one datacenter running NSX and reproduce the entire networking configuration on the DR side in a single run of the synchronization workflow or run it once initially and then a second time to incrementally update the NSX objects before cutover.

DRajan 4

Other supported use cases include partial site outages, preventive failover, or when you anticipate a potential datacenter outage, for example, impending events like hurricanes, floods, forced evacuation, etc.

The standard 1:1 deployment model with one site as primary and another as secondary is the most common deployed model. In a shared recovery site configuration, like for branch offices, you install one SRM server instance and NSX on each protected site. On the recovery site, you install multiple SRM Server instances to pair with each SRM server instance on the protected sites. All of the SRM server instances on the shared recovery site connect to the same vCenter server and NSX instance. You can consider the owner of an SRM server pair to be a customer of the shared recovery site. You can use either array-based replication or vSphere replication or a combination of both when you configure an SRM server to use a shared recovery site.

DRajan 5

Logical Disaster Recovery Architecture Using NSX Universal Objects

What Deployment Architecture Will the Solution Support?

This solution applies to all Greenfield and Brownfield environments. The solution will need the infrastructure to be base-lined to vCenter 6.0 or later, ESXi 6.0 or later, vSphere Distributed switch, SRM 6.0 or later with NSX 6.2 or later.

SRM can be used for different failover scenarios. It could be Active-Active, Active-Passive, Bidirectional, and Shared Recovery.

Integrated Solution Advantages

The ability to automate the disaster recovery planning, maintenance and testing process becomes much simpler, with automation enabling significant operational efficiencies.

  • The ability to create a network that spans vCenter boundaries creates a cross-site Layer-2 network, which means that after failover, it is no longer necessary to re-configure IP addresses. Not having to re-IP recovered virtual machines can further reduce recovery time by up to 40 percent.
  • There is more automation with networking and security objects. Logical switching, logical routing, security policies (such as security groups), firewall settings and edge configurations are also preserved on recovered virtual machines, further decreasing the need for manual configurations post-recovery.
  • Making an isolated test network with all the same capabilities identical to a production environment becomes much easier.

In conclusion, the integration of NSX and SRM greatly simplifies operations, lowers operational expenses, increases testing capabilities and reduces recovery times.

For more information on NSX visit: http://www.vmware.com/products/nsx/

For more information on SRM visit: http://www.vmware.com/products/site-recovery-manager/

For more information on VMware Professional Services visit: http://www.vmware.com/consulting/

 


About the Author:

Dharma Rajan is a Solution Architect in the Professional Services Organization specializing in pre-sales for SDDC and driving NSX technology solutions to the field. His experience spans Enterprise and Carrier Networks. He holds an MS degree in Computer Engineering from NCSU and M.Tech degree in CAD from IIT