I wanted to explain in more detail why we chose the type of dedupe that we did. As I had mentioned in my previous post, we chose to implement block based in-line destination deduplication for VMware Data Recovery (VDR). There are a few reasons for this, two of which are due to enhancements in the VMware vSphere 4 platform itself.
1) Change block tracking: Any new VM provisioned on vSphere will use virtual hardware version 7 (you can also upgrade your existing VM version 4 to version 7). With VM version 7, the vmkernel tracks the changed blocks of the VM’s virtual disks. (By the way, this the same change block tracking functionality that enhances Storage VMotion in vSphere 4). So, instead of having to scan the VM’s virtual disks to determine which blocks have changed every time a backup occurs, VDR just makes an API call to the vmkernel and gets this information “for free”.
Thus, VDR is able to dramatically cut down the amount of time and CPU cycles to calculate the changed blocks on a virtual disk. In addition, change block tracking also helps on the restore side of the equation. For example, if you wanted to restore yesterday’s VM image, VDR will make the reverse change block API call and will just transfer the changed blocks from yesterdays backup to revert the VM to its previous state. So, given that there is a lot of intelligence in the platform about virtual disk blocks, block based dedupe seemed like a natural direction for VDR to take.
2) Hot add disk: VDR can “hot add” virtual disk snapshots directly to the VDR virtual appliance. This is accomplished by leveraging capabilities of the vSphere storage stack. This means that VDR can bypass the LAN and stream the data from the snapshots directly to dedupe destination disk. In addition to reducing load on the LAN and effectively eliminating the need to block out other LAN traffic during the backup window, the streaming of data to the destination dedupe disk on the Data Recovery appliance will be considerably faster.
Note that there are three caveats to enabling hot add disk with VDR:
a. The source virtual disks need to be on shared storage
b. The ESX host where the VDR appliance is running needs to have visibility to this shared storage
c. You will need a vSphere edition that includes Hot Add as a feature
The knock against destination (or target) based dedupe is the fact that it consumes precious network bandwidth with the unnecessary transfer of data that will be discarded as part of the dedupe process. However, given that VDR only transfers changed blocks and can transfer these blocks off-LAN, the concern did not apply and thus we felt comfortable with a destination based dedupe architecture.
So does this mean that unless you have both change block tracking and hot add disk features enabled in vSphere 4, VDR and its dedupe capability is useless to you? Absolutely not! All data that is protected by VDR will be deduped, so you will enjoy the storage savings independent of what VM version is being backed up or what vSphere edition you are have installed. What change block tracking and hot add disk adds is additional efficiency and performance gains that will allow even more data to be protected in an ever shrinking backup window.