Storage Queues what are they and do I need to change them?

We have all had to wait in a line or two in our life, whether it is the dreaded TSA checkpoint line at the airport or the equally dreaded DMV registration line, waiting in line is just a fact of life. This is true in the storage world too; storage I/O’s have plenty of lines that they have to wait in. In this article, we examine the various queues in the virtualized storage stack and discuss the when, how, and why of modifying them. 

Queues are necessary for several reasons but primary they are used to allow for sharing of a resource and to allow for concurrency.  By using queues, vSphere is able to allow for multiple virtual machines to share a single resource.  Queues also allow for applications to have multiple active (“in-flight”) I/O requests on a LUN at the same time, which provides concurrency and improves performance. But there is a tradeoff; if you allow too much concurrency the underlying resource might get saturated. To prevent one virtual machine or one host from saturating the underlying resource, the queues have set sizes/limits that restrict the amount of I/O requests that can be sent at one time.

In a virtualized environment there are several queues. At the top of the stack, there are the various storage queues used inside the Guest OS. This includes the queues created and used by the application itself and the storage device drivers used inside the guest OS. In the virtualization layer inside the vSphere software stack, there are three main queues.   A World queue (a queue per virtual machine), an Adapter queue (a queue per HBA in the host), and a Device/LUN queue (a queue per LUN per Adapter). Finally at the bottom of the storage stack there are queues at the storage device, for instance the front-end storage port has a queue for all incoming I/Os on that port.    


When investigating storage performance problems and bottlenecks you should investigate the queuing at all levels of the storage stack from the application and guest OS to the storage array.  For this article, I’ll only discuss the queues in the vSphere storage stack.  

For most customers, the default queue sizes at each of the three main queues in vSphere are generally fine and do not require any adjustments.  But for customers with a high level of consolidation or very intensive storage workloads in their environment, some of the vSphere queues may need to be adjusted for optimal performance.  Below shows a diagram of the three main queues in vSphere with their typical default queue size.  As you can see, the I/O requests flow into the per virtual machine queue, which then flows into the per HBA queue, and then finally the I/O flows from the adapter queue into the per LUN queue for the LUN the I/O is going to.  From the default sizes you can see that each VM is able to issue 32 concurrent I/O requests,  the adapter queue beneath it is generally quite large and can normally accept all those I/O requests,  but the LUN queue beneath that typically only has a size of 32 itself.  This means that if a LUN is shared by multiple virtual machines the LUN queue might not be large enough to support all the concurrent I/O requests being sent by the virtual machines that are sharing the LUN. 


Why are the virtual machine queues and the LUN queues set to just 32?  The reason for setting the limit was to prevent one virtual machine or vSphere host from stealing all the storage performance by dominating the storage with its own I/O requests, the so-called noisy neighbor problem.  For instance, a single storage array LUN could be shared by multiple vSphere hosts, by limiting each vSphere host to only 32 concurrent I/Os on that LUN, the risk that one vSphere host would saturate the LUN and starve out the other hosts is greatly reduced. 

However, setting hard arbitrary limits was the old school way of doing things.  Today using features like Storage I/O Control (SIOC), vSphere can mitigate that virtual machine and vSphere host noisy neighbor risk through a more elegant and fair mechanism.  Therefore, today if you are noticing that your device queues are constantly bumping up to their maximum limits, it would be recommended to increase the Device/LUN depth and use SIOC to help mitigate any potential noisy neighbor problem. A quick little note, SIOC controls storage workloads by modify the Device/LUN queue depth, but SIOC cannot increase the device queue depth beyond the configured maximum. So you have to bump up the maximum yourself if your workloads need larger queues, and then let SIOC reduce it when needed. 

Why increase the device queue? The reason for increasing the device queue is that a storage array is generally more efficient if it can see multiple I/O requests at one time.  The more I/O’s the storage array knows about the more efficient it is at servicing them. This is because the storage array can rearrange the requested I/O blocks and take advantage of I/O block proximity. For instance, if a virtual machine requests 2 blocks that are very close to each other on the storage spindle, the storage array can retrieve the first block and then quickly collect the 2nd block while the storage head on the spindle was “in the neighborhood”.  If the queue depth was set to 1 and the storage array could only see one I/O request at a time, it couldn’t efficient collect other I/O blocks while the disk head was “in the neighborhood” of them, since the storage array wouldn’t even know what blocks you are going to want next.

You can monitor and check the current queue depths for the various queues and how actively they are being used.  There are instructions on how to do that in the “Checking the queue depth of the storage adapter and the storage device” kb article.  If you constantly notice that your Device/LUN queue is reporting 100% active / full then it might be an indicator that you are bottlenecked on your device queue or on the underlying storage.  

Another interesting queuing kb article, that reinforces the always check and follow your storage vendor’s best practices, is the “Controlling LUN queue depth throttling in VMware ESX/ESXi”.  vSphere has a feature to detect queue full warnings from the storage array and respond by reducing the Device/LUN queue so that the number of I/O requests that vSphere is issuing to the storage array is reduced until the storage array can catch up and free space in its queue. This feature is off by default but should be enabled according to your storage vendor’s best practices.    

In Summary, there are lots of queues in the virtualized storage stack and those queues have various different default sizes.  For most environments, you do not need to adjust the queues. However, for I/O intensive workloads that generate a large number of concurrent I/O requests or for heavily consolidated environments, it may be beneficial to adjust them so that the storage array can more efficiently process the incoming I/O requests.  Using SIOC and other queue throttling features can mitigate some of the potential risks of increasing the vSphere queues, but it is always best practice to test and evaluate the changes before implementing them in production and avoid oversizing or unnecessarily modifying the queues if you are not noticing queue full bottlenecks.  


Resources on VMware vSphere Storage Queues:

   VMware vSphere – Scalable Storage Performance white paper
    (Although it is a bit dated, it still has useful information)

   VMware KB: Checking the queue depth of the storage adapter and the storage device

   VMware KB: Changing the Queue Depth for QLogic and Emulex HBAs

   VMware KB: Changing Paravirtualized SCSI Controller Queue Depth



Previous Troubleshooting Storage Performance posts: