Moving Beyond IOPs for VDI
By Courtney Burry, Director of Product Marketing, and Tristan Todd, Reference Architect Engineer, End-User Computing, VMware
When it comes to desktop virtualization, one of the places you will end up spending a lot of your time is on storage planning and design. Given that 50%+ of the cost associated with most VDI deployments comes from storage—it’s definitely worth the time and effort to do your homework in advance.
But wading through the sea of storage vendors can often leave you wondering what’s really good enough. Undoubtedly, you will see the majority of storage websites touting thousands and millions of IOPS and promising unbeatable performance. But as many folks like Howard Marks and Scott Lowe have pointed out—getting good application performance is not all about IOPS. Even if an all-flash solution promises to deliver over 1M 4K random IOPS with a 60/40 read/write mix, you’ll want to make sure you know how much latency you’re dealing with. And this, as Marks points out, is because as an application or benchmark stresses your storage system, you may find that it continues to deliver high IOPS but at higher levels of latency.
So rather than getting too hung up on steady-state IOPS recommendations by these storage vendors for any VDI deployment, take some time to focus on disk latency.
When it comes to assessing latency, you will want to look up and down the storage stack and get a clear understanding of where latency can build up.
Ultimately you will want to look first and foremost at the top layer of your storage stack, where the application is running in the guest operating system. This will give you the total amount of latency that the application is seeing. Virtual-disk latency is one of the key metrics that drives good or bad user experience.
However, as many of you are aware if you are using vSphere today, ESXi cannot see application latency because it is above the ESXi stack. But ESXi can detect three types of latency that are also reported back into esxtop and vCenter. First and foremost is GAVG or guest latency. GAVG is made up of 2 major components: KAVG and DAVG.
DAVG is the measure of time that I/O commands spend in the device from the driver HBA to the backend storage array, and KAVG is how much time I/O spends in the ESXi Kernel.
It’s worth noting that KAVG is actually a derived metric, which means you won’t specifically calculate KAVG; you just calculate GAVG (Total Latency) and DAVG (Device Latency) and derive KAVG from the difference (Total Latency – DAVG = KAVG).
In addition, the VMKernel is very efficient in processing I/O, so there really should not be any significant wait in the kernel or KAVG. And in a well-configured, well-running VDI environment, KAVG should be equal to 0. If your KAVG is not equal to 0, then more than likely the I/O is stuck in a Kernel Queue inside the VMKernel.
To get a sense of the latency that the application will see in the Guest OS, you can simply compare the GAVG and the actual latency the application is seeing (through a tool like PerfMon). This will reveal how much latency the Guest OS is adding to the storage stack. For instance, if ESX is reporting GAVG of 10ms, but the application or PerfMon in the guest OS is reporting storage latency of 30ms, then 20ms of latency is somehow building up in the Guest OS Layer, and you should focus your debugging to the Guest OS storage configuration.
So how do you measure and monitor GAVG and DAVG? VMware vCenter and vCenter™ Operations Manager for Horizon View™ are good places to start. VMware vCenter provides alarms that will trigger if device latency is too high or if a virtual machine is using an excessive amount of storage bandwidth. One thing to be aware of is that vCenter uses a 20-second sample period, so it may miss some spiky performance issues, but it is a good first start. VMware vCenter Operations Manager for Horizon View provides an even more comprehensive view of the health of your VDI stack, with a robust set of analytics that will allow you to quickly troubleshoot and remediate your storage and environment.
Good storage performance is critical to any VDI rollout, and making sure you opt for the right solution is key. So be sure to look beyond IOPS and spend your time examining and monitoring latencies across the storage stack to make sure you can deliver the performance your end users need.