Something to be aware of with regards to vSphere Replication is how we handle stopping vs pausing replication. With a pause we temporarily stop shipping differential data to the recovery site, but with a stop we completely stop the replication, remove it from the UI *and delete the recovery data*.
Let’s take a look at this.
Say we have a VM that’s replicated, in this example “TestSRV1″.
We can see the replication in the “Outgoing Replications” tab, everything is looking good. We can go take a look at the recovery site datastore to see what files are there.
A bit of explanation of key files:
- TestSRV1.vmdk : The target disk file to which we are ultimately committing the replications.
- hbrdisk.RDID….vmdk: The current redo log that is complete as of the last replication
- hbrcfg…vmx(f): The shadow VMX files that will be ‘promoted’ to the full config files when we do our restore
If we right-click on the replication in the VR service, we have the options to Reconfigure, Pause, Sync, or Stop replication. If we choose to stop, we are prompted to make *sure* we want to stop:
So let’s say we stop… what happens? Well, VR will disable the replication, and “Unconfigure” it. We can see that in the “Recent Tasks” section of the interface as we commit this action. The replication then disappears outright from our VR monitor
Great, so we’ve stopped replication, but what does this mean? Well “Unconfigure” means completely unconfigure – including wiping out all the files in the target directory that it created for the replication at the recovery location. Notice that while the directory remains behind, all our files therein are gone:
So be very careful doing a “Stop”. Often a pause will be much handier for normal actions. Even if you wipe out the primary disk file while it’s paused, the replica stays there in place, and is ready to be recovered. If the primary gets annihilated, it will eventually hit an RPO violation state, but as long as the replication is still configured and not stopped, you can still recover the VM.
So my best practice is this: Even if you think you’re going to get rid of a replicated VM, use pause and only “Stop” the replication after you’re completely sure you’re done with the replica, as “Stop” is an unrecoverable action.
There’s always the ability, mind you, to copy the target VMDK to another location, while the replication is paused, and use it as a seed for replication if you think you might need to restart it again at some point.
*** Post Script ***
My esteemed colleague Mr Dilworth pointed out to me that I should clarify how this is handled in SRM vs. using vSphere Replication on its own. With SRM the interface for replication is handled entirely through the SRM plugin in the vSphere client, and it looks a little bit different:
As you can see, within SRM stopping the replication is titled “Remove Replication” which is probably a touch more informative. The behaviour is the same as outlined above, though: We are completely removing the replication and its replica files, including the config file for the replication itself with the RPO, VSS quiescing policy, etc. This means again, recreating the replication will require setting all the policies from scratch and potentially having to re-seed a copy at the recovery location if you want to save on bandwidth and time for the initial full sync.