Product Announcements

Troubleshooting Storage Performance in vSphere – Part 1 – The Basics

I frequently present at the various VMware User Group (VMUG) meetings, VMworld and partner conferences.  If you have ever attended one of my talks, you will know it is like trying to drink from a fire hose, it is hard to cover everything in just a 45 min session. Therefore I will take the time here to  write a few blogs that go over the concepts discussed in these talks in more detail (or at least slower). One of the most popular yet very fast paced talks I present is the Troubleshooting Storage Performance in vSphere. I’ll slow things down a bit and discuss each topic here, this might be just a review for some of you but hopefully as we get into more details there will be some new nuggets of VMware specific information that can help even the more advanced storage folks. 

Today’s post is just the basics.  What is bad storage performance and where do I measure it?

Poor storage performance is generally the result of high I/O latency. vCenter or esxtop will report the various latencies at each level in the storage stack from the VM down to the storage hardware.  vCenter cannot provide information for the actual latency seen by the application since that includes the latency at the Guest OS and the application itself, and these items are not visible to vCenter. vCenter can report on the following storage stack I/O latencies in vSphere.

 Storage Stack Components in a vSphere environment

LatencyInStorageStack

GAVG (Guest Average Latency) total latency as seen from vSphere

KAVG (Kernel Average Latency) time an I/O request spent waiting inside the vSphere storage stack. 

QAVG (Queue Average latency) time spent waiting in a queue inside the vSphere Storage Stack.

DAVG (Device Average Latency) latency coming from the physical hardware, HBA and Storage device.

 

 

To provide some rough guidance, for most application workloads (typically 8k I/O size, 80% Random, 80% Read) we generally say anything greater than 20 to 30 ms of I/O Latency may be a performance concern. Of course as with all things performance related some applications are more sensitive to I/O latency then others so the 20-30ms guidance is a rough guidance rather than a hard rule. So we expect that GAVG or total latency as seen from vCenter should be less than 20 to 30 ms.  as seen in the picture, GAVG is made up of KAVG and DAVG.  Ideally we would like all our I/O to quickly get out on to the wire and thus spend no significant amount of time just sitting in the vSphere storage stack,  so we would ideally like to see KAVG very low.  As a rough guideline KAVG should usual be 0 ms and anything greater than 2ms may be an indicator of a performance issue. 

So what are the rule of thumb indicators of bad storage performance? 

•             High Device Latency: Device Average Latency (DAVG) consistently greater than 20 to 30 ms may cause a performance problem for your typical application. 

•             High Kernel Latency: Kernel Average Latency (KAVG) should usually be 0 in an ideal environment, but anything greater than 2 ms may be a performance problem.

So what can cause bad storage performance and how to address it, well that is for next time…

And as a side note:  Check out your local VMUG (VMware User Group).  The VMUG community has more than 75,000 members with more than 180 local groups across 32 countries.  Many local area VMUGS have free user conferences that are a great opportunity to learn from in-depth technical sessions, demonstrations and exhibits and network with other VMware customers and partners. That is where I’ll be this week, presenting Storage Troubleshooting and Performance Best Practices at the Denver VMUG. Check out the VMUGs and maybe I’ll see you at a VMUG in your area.   http://www.vmug.com

Continue to Troubleshooting Storage Performance – Part 2:
http://blogs.vmware.com/vsphere/2012/06/troubleshooting-storage-performance-in-vsphere-part-2.html