vSAN Virtual Volumes (vVols)

vVols Replication Test-Failover with Power CLI 6.5.1

I recently had the opportunity to work with one of our valued vVols Partners, HPE, on automating vVols replication workflows.  HPE is a vVols design partner and one of the few partners to deliver Day 1 support for VASA 3, including array-based replication.

Today I’m very pleased to offer a guest article from Eric Siebert, HPE Solutions Marketing Manager and our very dear colleague on the vVols partnership.

————-

VMware introduced support for array-based replication of Virtual Volumes (vVols) as part of vSphere 6.5. This new ability provides users with array-based BC/DR capabilities that can be managed via VMware’s Storage Policy Based Management (SPBM) system that is built into vSphere.

To support vVols replication operations today, there is no native integration with automation products such as Site Recovery Manager (SRM) and vRealize Orchestrator. Instead VMware developed PowerCLI cmdlets as part of the vSphere 6.5 release so common BC/DR operations such as failover and recovery can be scripted as needed. However the initial release of PowerCLI with vSphere 6.5 was missing the key capability to perform a test failover of vVols based VMs.

Customers familiar with traditional array-based datastore replication are also aware that most array vendors support SRM’s ability to perform a test-failover operation to verify integrity of replicated virtual machines.  This is a critical capability that provides confidence in the ability to recover if or when a true disaster happens. Users of vVols will be happy to learn that the ability to perform a test failover is now available for vVols thanks to a recent update in PowerCLI 6.5.1.  Two new PowerCLI cmdlets were introduced to support this capability:

  • start-SpbmReplicationTestFailover
  • stop-SpbmReplicationTestFailover

I had a chance to test this feature on HPE’s 3PAR arrays, and here are some pointers on setting things up in your environment and some of the required steps to perform a successful test failover.  Additionally, HPE has provided some example disaster recovery scripts for use in your environment as you see fit.

As an aside, while I was working with 3PAR, I was able to familiarize myself with a few of its replication features.  Here’s some that stood out.

  • The vSphere administrator is able to perform per-VM (actually per-VVol) replication. Each VM can have its own replication group, or share an existing one.  The VMware administrator can create new groups by placing the VM into the “Automatic” replication group.
  • The vSphere administrator can select array and datastore placement and drive tier capabilities at the remote site for the source VM. 3PAR uses a type of template called a CPG (common provisioning group) for drive tier and RAID settings.  Here’s an example replication storage policy for 3PAR.

Types of Disaster Recovery

Before jumping into our example of test-failover, I thought I’d give a quick review of the types of disaster recovery scenarios.

Planned Failover Used in a controlled environment, when both the primary and recovery sites are fully functional.  Typically used to migrate VMs from one site to the other, perhaps for load balancing or data center relocation.  Typically only select VMs are migrated.  Once failover is complete, typically replication groups are reversed, to make the original primary site become a recovery site.  Consider reviewing a previous blog post, Automating VVol DR with Power CLI 6.5, for an example.
Unplanned Failover Used when the primary datacenter has gone off line, either because of a minor interruption, such as a network outage or power failure, or a more severe disaster, including actual loss of equipment, such as in a fire.  Typically all the VMs associated with a lost primary site are failed over to the recovery site.  In a minor interruption, eventually replication groups are reversed, making the primary site a recovery site.  However, in a true disaster, the storage administrator is typically required to help clean up broken replication groups and establish connectivity with a new recovery site.
Test Failover The simplest of the failover scenarios, does not result in an actual failover, and does not interrupt services running at the primary site.  But test failover provides confidence that, if needed, an actual failover would be successful, by exposing a copy of the replicated VMs to datastores and ESXi hosts at the recovery site, allowing integrity validation and testing.

Finally to our test failover example.

In our scenario, we have two data centers.  In the demo video, we call them “San Jose” and “Fremont”.  Each datacenter has its own instance of vSphere and storage array.  And we have a VM or two replicating from our primary (source) site to a recovery (target) site.  We’ll assume Fremont is the primary site.  And let’s assume the VM’s name is “VMFremont”.  We also assume that source and target vVols storage containers are mounted on ESXi hosts at each site.

We’ll assume that we have only a primary-to-recovery site relationship for this example.  However, there’s nothing preventing you from having VMs replicating in both directions.  In fact, since vVols replication is more granular that traditional array-based replication, with 3PAR, you can have the same vVols storage container (datastore) as both source and targets of replication, simultaneously.  But only the source VMs in those containers will be visible to vSphere.  Replica VMs will appear only once a failover completes.

To initiate a test failover, we need a number of inputs to the start-SpbmReplicationTestFailover API:

  • The target replication group
  • The source VVolID-to-ContainerId mapping
  • A PowerCLI connection to the vCenters at the source and target

The bulk of work is to discover the correct target replication group that you wish to fail over.  It’s associated with the source replication group at a source fault domain (with HPE 3PAR, the fault domain is equivalent to the array itself).  A typical process to discover the target replication group is as follows:

  • Connect to both vCenters, primary and recovery sites.
  • Identify the vCenter associated source fault domain (array), where the source VM is stored.
  • Identify which source replication group is responsible for replicating that VM.
  • Identify the paired target replication group.

Use the following code as an example of how to identify which source replication group is responsible for replicating your VM, as well as discovering the target group.

Use the following code to generate a Source VVol-to-Container map.

WARNING:  In our previous blog post, Automating vVols DR with Power CLI 6.5, we used a different process to discover the source vVols for our map.  That code had a limitation, in that it could not find snapshot vVols, as well as other types of vVols associated with the VM.  And failure to provide all vVols associated with the group will cause VM re-construction failures at the recovery site.  With PowerCLI 6.5.1, the above method can be used, which captures all the vVols associated with the group.  However, it does assume that all vVols in the group are associated with the same Storage Container.  That’s a safe assumption with HPE 3PAR’s implementation.

So now we’re ready to actually perform the test failover operation.

If no vmxFile is returned, make sure you’ve mounted the target vVols storage container as a datastore on at least one of your ESXi hosts at the target.

At this point, we’re ready to register the VMs at the recovery site.  However, it’s a good idea to save [not shown here] the $vmxFile list returned by Start-SpbmReplicationTestFailover.  This VM file path list is critical to the registration process, and good to have incase a retry is required, due to a registration or other failure.  The sample scripts provided do that for us.

One thing to note, that may not be obvious at this point, is that even though we intended to failover only one VM, it could be that the group contained multiple VMs.  You may want to register all of them, or just the one you want to test.  $vmxFile can be a file path to a single VM’s configuration file, or an array of paths to different VMs.  To keep our example simple, we’ll assume just one .vmx path was returned.

To register the VM, we need to identify which datastore has surfaced the VM, as well as which ESXi host can run that VM (is attached to that datastore).  Fortunately, the vmxFile path contains the name of the target datastore.

$tgtDatastores = get-Datastore -server <targetVcenterIPorName> $newVmDataStore=@() foreach ($datastore in $tgtDatastores) { $tmpName=”[$($datastore.Name)] ” if ($vmxFile.startswith($tmpName)) { $newVmDataStore = $datastore break } }

And finally, before we can register, we need to identify the ESXi that can run this VM.  In our example, we simply take the first host from the list returned.  But you may want to use a more sophisticated method such as based on load balancing or other logic.

Finally, we can register the VM.

Optional, but recommended so that we can verify that reversing the group would work in a true failover, we should assign a storage profile that is compatible VM’s current replication group.  For our example, we’ll used a hard-coded profile name.

At this point, the VM can be powered up and tested for validity.

Finally, we’re ready to clean up the test failover event.  While it’s optional to remove (from disk) the VM after the test has completed, at a very minimum, the VM must be powered down and unregistered.  Once powered down and unregistered, the test can be stopped.

Demo of Test Failover on HPE 3PAR

Here’s a demo of a test failover operation on HPE 3PAR, using the scripts provided.

This zip file contains the sample PowerShell scripts to support vVols replication. Be aware that all scripts are unsupported and run at your own risk and while every script has been written with the intention of minimizing the potential for unintended consequences we cannot be held responsible for any misuse or script problems.

If you are attending VMworld 2017 US and want to hear more about vVols replication first hand be sure and sign up for HPE session: STO3305BUS – Replicating VMware vVols: A technical deep dive into vVols array based replication in vSphere 6.5