By now, most of you have heard of and are perhaps using vSphere Replication (VR) for virtual machine migrations and disaster recovery. I imagine the majority is also aware of vSphere Data Protection (VDP) Advanced for backup and recovery. Note that these two solutions address different needs that are equally important, which means there are good reasons to implement both solutions. Having both solutions in place provides the means to recover from a wider variety of issues than what either solution could cover individually. Some workloads have stringent recovery time objectives (RTOs). A simple example might be a web server, which has an RTO of 10 minutes. VR can typically recover a replicated virtual machine (VM) in a matter of a few minutes. Restoring the entire VM using a backup and recovery solution, such as VDP Advanced, could take 30 minutes or longer, which violates the RTO policy. Conversely, another example might be the need to recover a folder inside of a VM from 6 months ago. VR only recovers an entire VM and, even though VR now has multiple point in time (MPIT) recovery capabilities, those points in time are typically a few hours or days in the past; not weeks or months. VDP Advanced can restore individual files and folders from restore points that are days, weeks, months, and even a few years old. Clearly, each solution has strengths designed to address different use cases. So the questions are: “Can they be used together? If yes, what do I need to be aware of?” Let’s dig into these questions…
The short answer is yes they can be used together. The screen shots below show VR 5.5 replicating the same two VMs that are being backed up daily by VDP Advanced 5.5. The recovery point objective (RPO) setting in VR for both VMs is set to 15 minutes and their replication status is “OK”. During the initial sync performed by VR, I ran the “Linux VMs” backup job in VDP Advanced, which completed successfully. In other words, the VMs were backed up at the same time they were being replicated by VR.
However, there is a corner-case issue you should be aware of: VR uses a mechanism called a lightweight delta (LWD) as part of the replication process. This LWD is a bundle of data – blocks that have changed since the last replication cycle – to be transferred to the remote site. VM snapshots are not utilized for replication of VMs that do not have Microsoft Volume Shadow Copy Service (VSS) quiescing enabled for VR. If VR VSS quiescing is enabled for the VM, a VM snapshot will be leveraged as part of the VSS quiescing process. VDP Advanced and most other backup solutions use a VM snapshot during an agent-less, image-level (entire VM) backup. If by chance, VR is creating a VM snapshot at the same moment a backup job tries to create a VM snapshot for the same VM, it is possible the VM snapshot for the backup will fail. It is also possible that orphaned snapshots could be generated. In the cases where this has occurred, it seems the issue is more prevalent in versions of these products prior to 5.5. Here are a few frequently asked questions and answers regarding this scenario:
Q: What if the VM snapshot fails during a VDP Advanced backup job? A: VDP Advanced will not back up the virtual machine and it will be marked as “Out of date” in the VDP Advanced user interface. The backup job will continue to run for the rest of the VMs in the job. If one or more VM backups fail, the number of failures is shown in the Backup Job Details.
Clicking “Show items” next to “Out of date:” reveals the specific list of VMs that were not backed up.
Backup failures are also indicated in the email reports produced by VDP Advanced.
Q: For a VM marked “Out of date”, how do I get a good backup of this VM? A: Perform a manual backup (“Backup now” in the VDP Advanced user interface) and select “Backup only out of date sources” or simply wait until the backup job runs again at its next scheduled start time.
Q: What about application backups using the application agents included with VDP Advanced? A: These are in-guest, agent-based backups that do not utilize VM snapshots. There should be no issues related to VM snapshots for VR (VSS quiescing enabled).
Q: Can I create a vCenter Server alarm to raise an alert when a VM is running from a snapshot? A: Yes, please see this VMware KB article: http://kb.vmware.com/kb/1018029
Q: How can I consolidate VM snapshots? A: This VMware KB article covers VM snapshot consolidation: http://kb.vmware.com/kb/2003638
Q: Are there recommendations that can be leveraged when protecting a VM with both VDP Advanced and VR? A: There are a few best practices that can further reduce the odds of encountering issues:
1. Set the RPO in VR to the longest acceptable value. For example, if the organization has deemed that an RPO policy of 8 hours is acceptable for an application, do not set the VR RPO at 2 hours just because you can. The shorter the RPO setting in VR, the more snapshot operations occur for a VM with VR VSS quiescing enabled. A shorter VR RPO setting typically consumes more network bandwidth, as well.
2. Leave VR VSS quiescing disabled (Quiescing method set to “None”), unless it is really necessary. The majority of applications recover well from crash-consistent replicas or copies. VSS quiescing may also be a challenge in VMs that are extremely I/O intensive. This VMware KB article discusses the issue in more detail: http://kb.vmware.com/kb/2040754. Naturally, this challenge would contribute to any issues with VM snapshots.
That pretty much covers the corner-case issue you may run into if using VR and a VM snapshot-based backup solution like VDP Advanced. As I mentioned before, the odds are slim and, if you do encounter this issue, the workaround is very simple. Furthermore, any challenges with running these two solutions together are far outweighed by the benefits:
– VDP Advanced and VR are easy to deploy and manage using the vSphere Web Client
– Minimal cost: vSphere Replication is included with vSphere Essentials Plus and higher
– Both solutions combined help remediate a wide variety of downtime scenarios
– Support comes from a single vendor (no “finger pointing” when an issue arises)
– VDP Advanced can replicate backup data, which provides additional protection
One last question to address: What about Site Recovery Manager (SRM) and VDP Advanced? The answer is actually quite simple. If SRM is used with VR as the means for replicating VMs, the issue covered in this article is a possibility, just the same. If SRM is backed by array replication, there is no issue since array replication does not use VM snapshots.
If we step back a bit and take a look at the big picture, we see that utilizing VMware availability solutions together can dramatically improve the uptime of applications and services running in VMware virtual machines:
– vSphere HA, FT, and App HA help ensure quick recoverability from host failures, guest OS issues, and application downtime.
– vSphere Replication and SRM enable easy, automated, rapid disaster recovery
– VDP Advanced provides reliable data protection for VMware virtual machines and tier-1 applications such as Microsoft SQL Server, Exchange, and SharePoint.
@jhuntervmware