By Matt Eccleston, chief architect – VMware View, VMware
VDI with VMware View has brought many benefits to customers over the years, including business agility, improved control and security and end-user flexibility. However, a vexing problem as our customers scale up the size of their deployments has been how to achieve cost-effective storage designs for VDI environments while maintaining an excellent quality of service for their end users. The “VDI storage problem” fundamentally stems from the different economics of traditional desktop storage (a local SATA drive), and datacenter-class storage. Datacenter storage is almost always more expensive on a $ per GB, or $ per I/O throughput basis. However, at the same time, datacenter storage offers significant opportunities for pooling resources, securing, consolidating and centralizing the data of the desktop.
VMware has always been a leader in attacking this challenge, able to closely leverage its vSphere technology and apply it to the VDI market through VMware View. An example of this is View’s linked clones technology. View Composer linked clones allowed you to take advantage of the fact that many of the virtual desktops in a given environment had gigabytes of identical content (since they all came from the same Windows image template) and did not need to waste capacity by storing the same content multiple times on disk. For many scenarios, View Composer achieves a massive reduction in the amount of storage space consumed on datacenter storage.
However, with the subsequent heavy consolidation of VM data onto a relatively small amount of physical storage, coupled with the tendency of desktops to exhibit synchronized bursty storage workloads (the famed I/O storm problem, with “boot-storm” being the most well known of the ilk), new challenges arose with I/O throughput. Capacity was no longer the main issue. IOPS became the key design factor for VDI storage.
VMware again led the way, introducing View Composer based storage tiering in VMware View 4.5. This allowed the parent disk, in a linked clone scenario, which contains all of the common content of the Windows image template, to be stored on a different class of storage (typically SSD/EFD backed). This allowed for a significantly better user experience at lower cost, by dramatically increasing the read I/O capabilities available to the VDI environment.
However, storage tiering done in this manner, still had three major gaps to address:
- It required a specific storage array configuration
- It did little to address contention on the storage interconnect and storage controllers
- It was best-suited for stateless desktop type deployments
The View Storage Accelerator helps address all three of these. The first two are addressed by adding a high read I/O capability into vSphere itself. This means that for cached read I/Os, an I/O is never even issued by ESX to the storage! In addition, it works regardless of the storage backend being used, allowing View to be deployed cost-effectively on a much broader set of storage platforms and architectures.
To illustrate how the View Storage Accelerator addresses the last point, having a read-acceleration technology that works equally well for stateful or stateless desktops, requires a bit more explanation on the internal workings of View Storage Accelerator.
The View Storage Acceleration functionality uses a patent-pending technology in ESX known internally as a content-based read-cache. There is a lot in a name. A content-based read cache is a cache in that it uses host memory to store data blocks. It is a read-cache in that it addresses only reads (as an aside, host-based write caching for stateful desktops has a very tricky coherency problem in the presence of a host-failure to solve). And perhaps most importantly, it is content-based, which means it can cache any block from any VM accessed by the host that holds identical content, regardless of how those VMs were created. This means that it works for VMs that are full copies of each other, VMs that are created from common linked clones, VMs that are created through array-based provisioning techniques, or even VMs that are P2V’d and imported from vCenter. Because the cache holds blocks of data indexed by their content rather than by logical sector, the required cache size is very small, small enough that it can fit within a reasonable, cost-effective amount of memory on every server in a cluster.
So how does this content-based caching work exactly?
The key to understanding this is: Offline indexing, online caching. At VM creation time (or in the case of View manual pools, when the VM is imported), and at configurable intervals there-after (typically weekly or monthly), the content of each VMDK file is indexed and fingerprints of the content are stored in what we call a digest file per VMDK (e.g. “myDisk-digest.vmdk”). The digest file allows for efficient management and lookups into the vSphere-based cache while the VM is running.
There are two somewhat technical points needed to complete a detailed understanding of the View Storage Accelerator mechanics:
The first is that the cache only caches blocks common to more than one VM. This is because Windows already has caching mechanisms for it’s I/O (it’s page cache). Caching that data twice, with no benefit to other VMs, would provide no benefit and waste precious RAM.
The second is that to avoid expensive computation on the I/O path, writes to previously cached entries cause the VM that issued it to no longer participate in caching for the written block (since the content likely changed). If the newly written data is in fact the same across multiple VMs, this fact will be picked up the next time digest files are regenerated (the weekly / monthly periodic maintenance operation mentioned above). To illustrate this nuance: If you roll out a significant service pack update to all of your VMs by installing it directly in each VM (an example where lots of identical content is written to different VMs), those newly written blocks will not participate in the caching immediately, but will do so at the next interval where digests are recalculated. We feel this periodic storage maintenance task (which can be configured through View as described in my colleague Narashima’s blog) is a small price to pay for keeping expensive computation off of the storage I/O path, and allowing interoperability with all storage architectures.
Ok sounds great. So what can I expect for results, and what do I need to know to put this to use in my environment?
For some of the performance results we’ve seen in our labs I will refer to you another blog entry from Narashima, who covers it well.
In terms of putting it to use in your environment, the View Storage Accelerator was designed to be applicable in any VDI scenario with VMware View. The only common scenario where you may not want to enable it is where you have an existing deployment, and are fully satisfied with the I/O characteristics of your storage. The View Storage accelerator consumes some RAM on the ESXi hosts both for the cache of data blocks, and the per-VMDK metadata associated with managing the cache entries. For most configurations this will be less than 5% of RAM on a given system. (If this sounds alarming at first blush, please consider the economics of 5% more DRAM capacity vs. the costs of getting equivalent levels of I/O improvement from the storage, and we suspect you, like us, will find that the tradeoff is well worth it). However, in a system where everything is running well and sized appropriately already, and the storage has sufficient I/O capabilities (perhaps because tiered storage is in use), it may not be worth “rocking the boat” in terms of reducing RAM available (or number of VMs) per server.
Finally, I do want to recognize that while View Storage Accelerator provides impressive benefits in addressing read I/O storms, it doesn’t directly address write I/O storms. Designing a storage architecture that accounts for peak write I/O in a VDI environment will still require consideration and adherence to best practices, especially for stateful desktops. We are not yet at VDI storage nirvana (my definition: storage $/user <= PCs, perf >= PCs, planning effort <= PCs). But the View Storage Accelerator, by continuing our history of integrating our best-in-class vSphere technology with best-in-class VDI management software brings us one big step closer, and starts to pull back the veil on the types of approaches that may allow us as an industry to not just resolve the challenges, but take advantage of the benefits of heavily consolidated storage for VDI environments.