Product Announcements

Some Thoughts on Backup Data Storage for vSphere Data Protection

I have received a fair number of questions around where to store backup data with vSphere Data Protection (VDP) and vSphere Data Protection Advanced (VDP Advanced). As with most questions in the IT world, the answer often starts with “It depends…” and this is no exception. However, the intent of this article is to at least provide some higher-level guidance and design ideas. You may have already thought of some (or all) of the items in this article, but I thought it made sense to go ahead and include even the most obvious recommendations.

Let’s start with backup data placement. It is a well-known best practice to store backup data separate from production data. Otherwise, a storage outage could render both your production virtual machines (VMs) and your backup data useless. Not good. Ideally, your environment would have at least two independent storage platforms. Production workloads would run on one storage platform while the other might run a few low priority workloads and store backup data.

As with nearly all vSphere cluster designs, it is important all hosts in the cluster can “see” the same storage. This is a best practice for a variety of reasons (vSphere HA, vMotion, DRS, etc.) and data protection is another good reason. VDP (and many other backup solutions) can utilize the SCSI HotAdd transport mode to backup virtual machines. This transport mode is more efficient and drastically reduces network bandwidth consumption versus backing up using NBD, i.e., “over the network”. To utilize SCSI HotAdd, the VDP virtual appliance must have access to the production .vmdk files, thus the need for the same storage configuration on every host in a cluster.

I realize not every environment has the luxury of two independent shared storage platforms. You may be asking “What if I only have one shared storage array?”. There are a few possible solutions:

1. Store production and backup data on the same shared storage platform. This introduces considerable risk, as mentioned before: If the storage array goes down, you lose access to both production data and backup data. Not recommended.

2. Use direct attached local storage (DAS), if available, to contain the backup data. In this scenario, production data is on shared storage, which facilitates the use of vSphere HA, DRS, and so on. VDP is configured to store backup data on DAS. If the shared storage is lost, backup data is still available on DAS. Keep in mind with this configuration, the VDP appliance must stay on the host where the local storage is configured for use by VDP Advanced.

3. Leverage replication in VDP and VDP Advanced. If production data and backup data are on the same shared storage platform, you can use the VDP backup data replication feature to move backup data to another environment in the same data center or offsite. Of course, this assumes you have another environment or site to replicate the data to. Another option may be a service provider that has a “backup data replication target as a service” offering. Note that VDP (not the Advanced edition) can replicate only to EMC Avamar. A VDP Advanced virtual appliance can replicate to both EMC Avamar and to another VDP Advanced virtual appliance.

4. Utilize EMC Data Domain as a target for VDP Advanced backup data. Data Domain is purpose built for backup and recovery and using this model effectively separates production data from backup data. VDP Advanced contains the Data Domain Boost libraries for increased performance and ease of use.

An important item we have not mentioned up to this point is performance. As with any backup solution, VDP and VDP Advanced generate a significant amount of I/O. Especially when performing multiple, concurrent backups. The storage platform must be able to handle this I/O. If the storage does not meet the performance requirements, it is possible backup failures will occur, error messages will be generated, etc. VDP 5.5 and VDP Advanced 5.5 include a performance analysis feature. This analysis can be run during VDP virtual appliance deployment or post deployment. I recommend running the performance analysis when the virtual appliance is deployed. Keep in mind this analysis may take a long time – potentially, several hours. Once the analysis is complete, the results are shown.

One last item worth discussing is the ability to place the VDP guest OS .vmdk file and the .vmdk files that make up the backup data partition on separate data stores. With earlier versions of VDP, these had to be on the same datastore. With version 5.5 of VDP and VDP Advanced, the .vmdk files can be on the same datastore or separate datastores.

As a footnote, it is possible to take an existing backup data partition and connect it to a newly deployed appliance. This may be useful if the guest OS partition is lost or damaged and the administrator wishes to salvage the backup data.

This article shows the process in more detail.

Hopefully, the discussion above has provided some food for thought on backup data placement with VDP and VDP Advanced. We would also like to hear where you store your backup data. Every environment is different and it is very interesting to hear how various organizations have tackled this common challenge. Feel free to summarize your story in the comments section and thank you in advance for the feedback.

@jhuntervmware