Data protection strategies often include snapshots in some form or another. They can play an integral part of a comprehensive 3-2-1 data protection strategy, as well as augment formal data protection practices to make recovery operations more convenient. But snapshots tend to introduce other technical challenges and limitations that can impact their use in production environments.
With VMware vSAN 8 U3, as a part of VMware Cloud Foundation 5.2, we are announcing the debut of vSAN Data Protection. Powered by the revolutionary vSAN Express Storage Architecture (ESA) It represents a monumental shift in our customer’s ability to protect, recover and clone virtual machines using software they already know.
Unlocking New Capabilities with Better Snapshots
The foundation of vSAN Data Protection is the snapshot engine introduced as a part of vSAN ESA. The design of vSAN ESA allowed our engineering teams to develop an entirely new snapshot engine from the ground up. We were able to do away with the old redo-log based approach for snapshots found in the original storage architecture. The design of our snapshotting engine in vSAN ESA is centered on our patented log-structured file system (LFS) and our highly efficient metadata structures. Various forms of log-structured file systems are commonplace today, but VMware has a unique history with this technology, as Mendel Rosenblum, the co-founder of VMware, is cited as the first to implement a log-structured file system in 1992. Our LFS in vSAN is novel in several ways. It is implemented on a per-object basis, offering finer levels of control than monolithic approaches. It is also distributed, which provides the scalability and flexibility commonly associated with distributed systems.
The snapshotting engine in vSAN ESA allows snapshots to be kept with effectively very little to no performance penalty. The benefits are quite similar to array-based snapshots, often lauded for their efficiency and performance. But vSAN ESA snapshots have some distinct advantages from these array-based snapshots that have a material impact on the usefulness of them in a production environment.
The Challenge with Array-Based Snapshots
The most common way storage arrays present capacity resources are through LUNs. Paired with a clustered file system like VMFS, this will hold dozens or hundreds of different VMs and their associated files on a LUN for use by vSphere hosts that have mounted the datastore. While this has worked reasonably well over the years, it has some inherent challenges.
- Unit of management. While you may care about a few discrete VMs in a LUN, the array views data in the LUN in its entirety. An array-based snapshot captures all of the changed data within a LUN, including VMs that you may not have wanted to snapshot. This can increase capacity consumption unnecessarily, and making recovery more complex
- Snapshot creation and coordination. The unnatural unit of management that a LUN provides is just part of the problem when creating snapshots. Array snapshot mechanisms on their own have no awareness of the VM itself, or when data operations have been initiated by the VM. Additional operations may be performed to capture an array snapshot of an entire LUN so that I/O is persisted in a consistent state. This means that depending on the circumstances, the hypervisor may stun each VM instance using that LUN to ensure the array can capture the VMs in a consistent state. This takes time and a lot of coordination.
- Snapshot recovery. Recovering a VM to a previous state using an array-based snapshot usually involves several delicate steps to carefully ensure that the snapshotted LUN is temporarily presented to the hosts without interfering with the existing VMs. The current VM would need to be removed, and the old VM would need to be copied to a new location, and reregistered with vCenter Server followed by unmounting the temporary LUN, and other cleanup operations. It is a process that can be time consuming, and prone to error.
Figure 1. Array-based-snapshots with vSphere.
The rigidity of LUNs and lack of awareness of I/O operations of the VMs has often translated into increased complexity of operations. vSAN achieves snapshots in a better way.
A Better Approach to Snapshotting Data
Snapshots in vSAN ESA can achieve the capabilities typically associated with array-based snapshots, but without many of the challenges. As noted in the post: “vSAN Objects and Components Revisited” vSAN stores data that is analogous to an object store. vSAN uses a set of discrete objects that represent aspects of the VM, such as virtual disks (VMDK). This smaller boundary of data in vSAN results in better availability, scalability, and management. But this model has a profound benefit when it comes to snapshotting VMs.
- Unit of management. ESA snapshots occur per VM. VMs are what users care about, so it makes much more sense to do it this way. With ESA snapshots, the changes tracked after a snapshot is taken only pertain to the VM with the snapshot.
- Snapshot creation and coordination. Since vSAN is a part of the hypervisor, it has complete visibility and control of the VM’s data path. This allows the snapshot engine to snapshot the VM ensuring that data is committed in a crash-consistent state without stunning the VM. It is fast, and completely transparent to the user.
Snapshot recovery. Whether you are recovering an existing VM from a previous point in time, or restoring a deleted VM, the recovery process is easy and intuitive. Recover VMs easily right within the vSphere UI in vCenter Server.
Figure 2. Comparing array-based snapshots to snapshots in vSAN ESA.
Snapshots performed at the VM level are a more meaningful unit of management for customers. Not only is this approach more intuitive, but it is much more efficient, as it snapshots only the VMs that you wish, versus everything on a LUN.
A Better Approach to Protecting Data
But a fast, scalable snapshot engine was not enough. Our customers wanted a way to use snapshots in a manner that can be used in recovery and data manipulation scenarios. They wanted a way to schedule and retain snapshots in an automated manner. They wanted a way to do this easily, and in an integrated way using vCenter Server. This is what we’ve delivered with vSAN Data Protection.
Figure 3. Cluster-level view of vSAN Data Protection in the vSphere Client UI (replace with screen capture of real UI)
vSAN Data Protection gives our customers what they have always wanted: A way to protect data that is easy to use, efficient, and integrated right into the software they already know. Not only have we achieved this, but thanks to the architecture, introduced innovations that make it user friendly, and flexible.
- Extremely fast operations. Since snapshots are per VM, operational activities are simple, and fast. vSAN is aware of I/O activity from end-to-end, and minimizes all of the inherent delays in creating and recovering crash consistent snapshots using storage arrays.
- Snapshot scalability. vSAN Data Protection supports up to 200 snapshots per VM. This breaks through the 32 snapshot limit found when initiating a snapshot using traditional methods in vCenter Server and using VADP based APIs.
- Dynamic grouping. Central to the use of vSAN Data Protection are “Protection Groups.” This represents a logical container where you can group multiple VMs for easy and repeatable snapshot creation and management. Within the protection group, you can define a policy of outcomes, such as the frequency of protection and its retention schedules. VMs can be assigned statically, or even dynamically assigned using “*” and “?” wildcard characters. For example, assigning membership using “SQL-* allows you to protect all VMs with a VM name that includes “SQL-” in the VM name. Protection groups make for an incredibly easy and flexible way to manage data protection requirements.
- Optional data immutability. Snapshots can be made immutable, meaning that the snapshot cannot be modified or deleted. This option, which is a simple toggle within the settings of a protection group, provides the foundation for basic protection against malicious activities, and ties in directly with VMware Live Cyber Recovery (VLCR), our comprehensive ransomware protection solution.
- System safeguards. Snapshots can be great but can also increase capacity utilization if the data change rates and snapshot frequencies are high. To protect against inadvertent data consumption issues, vSAN Data Protection will paus snapshots if 70% of the cluster capacity is reached. It will also automatically expire snapshots that attempt to exceed the limit of 200 snapshots per VM.
Practical Ways to Use vSAN Data Protection
While the technology is impressive, it is the result that matters. Easy protection must be paired with easy operational recovery to be useful in real world scenarios. Below are a few examples of how vSAN Data Protection can be used.
- Revert existing VMs to a previous state. Achieve fast recovery of VMs that may have been accidently misconfigured, or had a failed VM upgrade, or even suspected malicious activity.
- Restore deleted VMs. Easily restore VMs no longer registered in vCenter Server, which helps protect against accidental or malicious deletion of VMs.
- Clone VMs. Quickly create a clone of a VM from a snapshot, which can be an easy and efficient way to have multiple copies of data.
- Ransomware protection. vSAN Data Protection can be used with VMware Live Cyber Recovery (VLCR) so that you can easily build out a comprehensive ransomware protection and recovery solution.
At this time, vSAN Data Protection is limited to providing local protection of VMs. But it can be an ideal fit to augment with existing and more comprehensive 3-2-1 backup strategies. For more information and answers to commonly asked questions, see the vSAN Data Protection FAQs on core.vmware.com.
Summary
vSAN Data Protection represents a better way to protect and recover virtual machines. It exploits the capabilities of vSAN ESA to provide benefits that are difficult to achieve with external array-based approaches. And best of all, vSAN Data Protection is readily available in your license of VCF.