Role of vVol Replication in Failovers

Observing the continuous growth in adoption of vVols, it has become necessary to understand more about vVol replication. This blog post is going to help you to dive into concepts of vVol Replications with API calls and its Operations.

vVol Replication:

vVol Replications are array based replicators, that can be controlled by VASA APIs from vCenter / Servers to Vasa Providers.

Since these are an Array-based replicators, vVol replications are policy driven. To configure vVol replication, replication capabilities and replication groups will be required and they are delivered from the Array by Vasa provider. These are visible in vCenter when policy is being assigned to the vVols.

To support vVol Replication, we should have at least two VCs, which are considered as sites, say site A (Source) and site B (Target). Each site will have its own vasa provider and these two vasa providers need to be paired up as peers to each other.

vVol Replication Terms:

Storage Policy : Storage Policies are set of rules that are configured on vCenter. It involves storage capabilities such as retention period, replication interval, etc.

Storage policies are required to fetch the compatible datastore which supports vVol replications as per the rules configured in the policy.

Fault Domain : Fault Domains are available at both Source and Target vasa providers. Each fault domain can be considered as an array / unit of an array. It can contain one or multiple replication groups.

Fault domains are paired with remote fault domains and failover of resources happen between fault domains from either sites. These failover of resources(vVol devices) always fail as a unit on storage array.

Source Replication Group: All the replicated vVols are placed / attached to the source replication group. A source replication group can contain from the range of 1..N VMs. Individual VM failover is not possible, the VM/VMs which belong to replication groups are replicated to other site when replication group failover happens.

Target Replication Group: Target replication group will have all the failed over groups, test failovers and even replicated VMs at the Target site. Failover is executed towards target replication group, while Reverse Replicate (Reprotects) is always executed against the target replication group.

Understanding about Replication Workflow:

To support vVol Replication, there are two workflows that needs to be addressed:

Provisioning Workflows
Failover Workflows

Overview of two sites vvol replication

Overview of Two sites and Representation of FDs and RGs.

For provisioning workflows, there are 2 major considerations:

Assignment of replication groups
Assignment of storage policy to VM vVols

For Failover Workflows, there are 3 major considerations:

Planned Failover – All VMs powered Off, unregistered and will be prepared for Failover from one site to another.
Forced Failover – All VMs are failed over from one site to another without any intimation.
TestFailover – All VMs are failed over as temporary from one site to another site and after stopping the testFailover, VMs are back to source site.

There are currently 5 workflows that needs to be considered for correcting the metadata associated with a vVol backed VM upon a site failover:

Resolving duplicate namespace identities associated with a target datastore.
Updating the paths embedded in VM files on a target datastore.
Fix up vmdk metadata and detect linked clone promotion cases.
Updating the object IDs embedded in VM files on a target datastore.
Resolving inconsistent metadata associated with VVols on a target datastore.

vVol Replication APIs and Behaviour:

vVol Replication has total 11 core replication APIs; Using them, any user can replicate, operate and fetch all the details like replication groups and other internal topology of the setup.

failoverReplicationGroup_Task ():

This API is always called at source location. vVols will be replicated to target replication group to the latest snapshot (Point in time Replica if specified) and target replication group

state will be moved from TARGET to FAILEDOVER. Once Failover task is completed then ReverseReplicationGroup_Task is issued.

API does not power off or unregister vms in source at vCenters nor registers recovered VMs in the recovery site. API does not destroy the source vVols, replication group when vVols are failed over.

prepareFailoverReplicationGroup_Task():

It is always called at the source location. Based on persistence and non persistent replicators, VMs may need to power off and unregister before calling this API. Optionally after calling prepareFailoverReplicationGroup vVols will be read-only mode and it prevents binding to the vVols; In this case only after calling reverseReplicateGroup vVols are assumed to be capable of accepting writes.

promoteReplicationGroup_Task():

It is always called at target location. This API is used to move replication group from INTEST to FAILEDOVER state. Running this will cause any attempt to run testfailover stop to fail. When replication group is promoted to FAILEDOVER then source vVols must become read-only.

queryFaultDomain_Task():

It can be called in both source and target locations, API returns fault domains that are described by an array. If the argument passed is NULL then API returns all the fault domains from the site it has been called or if argument passed is a list of fault domains then it returns the complete details of the fault domains.

queryPointInTimeReplica_Task():

It is always executed at target location; This API returns all the list of Point in Time Replicas (PITs) if the argument provided is NULL; details of PIT are returned if replication group is provided as a parameter.

API returns short response and long response. Short response is returned if it has to return more PITs and response is set to summary result. Long response is returned if it has to return details of PITs as an argument and response is set to success result.

queryReplicationGroup_Task():

It can be called in both source and target locations, API returns replication groups that are described by an array. If argument is passed as NULL then API returns all the replication groups from the site it has been called or if argument is passed is a list of replication groups then it returns the complete details of the replication groups.

queryReplicationPeer_Task():

This API is used to find or discover replication topology at the fault domain level, this API is available at both source and target locations. Source should know the about all its targets, as well as targets needs to know about source.

reverseReplicationGroup_Task():

This API performs replication in reverse direction from target to source, it always executed on target location with the replication group having FAILEDOVER state.

When replication group is in FAILEDOVER state, reverseReplicationGroup will convert state to source at target and at the source site, pair group will be moved from source to Target. Failback will be done to latest snapshot (PIT) of replication group.

syncReplicationGroup_Task():

It always executes at target location, the purpose of the API is to sync latest changes of vVols from source to target and this API is called before initiating failover.

The Point in Time Replicas (PITs) will be created at time of syncing from source to target, this PITs can be considered as snapshots for given replication group.

Once sync is initialized, progress can be tracked from the vSphere task manager and sync will be set to syncOnGoing, vp allows only one sync at a time to a given replication group. If two sync operations are issued then second sync will be notified with error syncOnGoing.

testFailoverReplicationGroupStart_Task():

This function is called at target location. Once API is invoked at the given replication group the state will then be moved from TARGET to INTEST.

It creates all the temporary vVols at target location; These vVols will be accessible and can be used same as source vVols. Re-executing the test failover start without an intervening stop should return the same result as the first test failover. There can be more chances of exceptions such as resource not available or invalid policy.

Test failover will fail if the target policy being passed is not compatible. Test failover should succeed if the profile is compatible with replica vVols. If the policy cannot be applied to the INTEST vVols after test failover, then compliance status should be reported as non-compliant.

Since these are temporary vVols there will be no relationship between source and target vVols, changes to target vVols will not be reflected to source vVols.

testFailoverReplicationGroupStop_Task():

This function is called at target location. Once API is invoked at given replication group then state will be moved from INTEST to TARGET.

If testFailover is stopped in the given replication group then changes to vVols on target will not be present in source vVols and those temporary vVols will get deleted.

Example of Failover using PowerCli:

Connect both source and target vCenters
- $srcServer = '10.170.106.28' $tgtServer = '10.206.107.66' connect-VIServer -Server $srcServer connect-VIServer -Server $tgtServer
  
  1
  2
  3
  4
  
  $srcServer = '10.170.106.28'
  $tgtServer = '10.206.107.66'
  connect-VIServer -Server $srcServer
  connect-VIServer -Server $tgtServer
Get the VM, which is assigned with Policy and Replication group to the VM
- $vm = Get-VM -Name 'VM-Failover'
  
  1
  
  $vm = Get-VM -Name 'VM-Failover'
Fetch Replication group from the VM it is assigned to it. Get the target pair from the source replication group.
- $rg = Get-SpbmReplicationGroup $rg = $rg[n] (n = index number for the RG assigned to VM). $rgPair = Get-SpbmReplicationPair -Source $rg
  
  1
  2
  3
  
  $rg = Get-SpbmReplicationGroup
  $rg = $rg[n] (n = index number for the RG assigned to VM).
  $rgPair = Get-SpbmReplicationPair -Source $rg
Do a Sync of vVols from source to target site
- Sync-SpbmReplicationGroup $rgPair.Target
  
  1
  
  Sync-SpbmReplicationGroup $rgPair.Target
Power off VM, unregister VM and call prepare failover replication group
- Stop-VM $vm Remove-VM $vm Start-SpbmReplicationPrepareFailover $rgPair.Source
  
  1
  2
  3
  
  Stop-VM $vm
  Remove-VM $vm
  Start-SpbmReplicationPrepareFailover $rgPair.Source
Do a Failover from source to target replication group
- $vmFilePath = Start-SpbmReplicationFailover $rgPair.Target
  
  1
  
  $vmFilePath = Start-SpbmReplicationFailover $rgPair.Target
Register Failedover vVols / VM on target hosts in vCenter
- $vm = New-VM -VMFilePath $vmFilePath -VMHost '10.199.7.178'
  
  1
  
  $vm = New-VM -VMFilePath $vmFilePath -VMHost '10.199.7.178'

vVol Replication:

vVol Replication Terms:

Understanding about Replication Workflow:

vVol Replication APIs and Behaviour:

Example of Failover using PowerCli:

vVols Resources:

Related Articles

VMware Explore Barcelona: Key Storage Sessions

vSAN 8.0 Update 2 is IA along with Core Storage

Wrap-Up of VMware Explore 2023: All Things vSphere Virtual Volumes and Core Storage