Product Announcements

Virtual Machine I/O – Fairness versus Performance

Many of you will have read various articles related to queue depths, especially in the area of LUN/device queue depths, and how these can be tuned to provide different performance for your I/O. However there are other queue settings internal to the VMkernel, which relate to how many I/Os a Virtual Machine can issue before it has to allow another virtual machine to send I/Os to the same LUN. What follows is some detail around these internal settings, and how they are used to achieve fairness and performance for virtual machine I/O.

Warning: These settings have already been pre-configured to allow virtual machines perform optimally. There should be no reason to change these unless guided to do so by VMware Support Staff. This is all about performance vs fairness. Failure to follow this advice can can give you some very fast virtual machines in your environment, but also some extremely slow ones.

Disk.SchedNumReqOutstanding
This advanced setting represents the maximum number of I/Os one individual Virtual Machine can issue all the way down to the LUN when there is more than one Virtual Machine pushing I/O to the same LUN. That is the important point. When it is the only virtual machine issuing I/O, it can fill the device queue depth with I/O requests. But when it is sharing the LUN with other active virtual machines, this value throttles that I/O for fairness. The default value for this setting is 32, meaning it can issue a maximum of 32 I/O requests when multiple virtual machines are active on the same LUN from an I/O perspective. If there is only a single Virtual Machine pushing I/O to a LUN, we obviously would not want to throttle it. In that case we allow it to send as many I/Os as per the device queue depth, which may also be 32 by the way, but it can be made to be much higher. In many cases, a queue depth of 64 can be found. Let’s say that in this example, the device queue depth is 64. When a second virtual machine starts to issue I/O to the same LUN, Disk.SchedNumReqOutstanding halves the number of I/Os sent to the LUN (from 64 down to 32) from the virtual machine in an effort to achieve fairness between all the virtual machines sharing the datastore.

Disk.SchedQuantum
When there are multiple Virtual Machines sharing the same LUN, there may be an occasion where we may want it to send more I/Os than the number defined in Disk.SchedNumReqOutstanding. For example, in the case of a sequential I/O pattern, we might allow a number of additional I/Os to complete.  The reason for this is that performance will be impacted if we have to seek back to the same location to complete the sequential I/O pattern when this Virtual Machine’s I/O is next serviced.

The parameter Disk.SchedQuantum represents the maximum number of consecutive “sequential” I/O’s allowed from a virtual machine before we force a switch to service the I/Os from another Virtual Machine. Disk.SchedQuantum’s default value is 8. Therefore during a “sequential” I/O operation, we may allow a virtual machine to issue its 32 I/Os as per Disk.SchedNumReqOutstanding and then an additional 8 I/O as per Disk.SchedQuantum.This allows us to improve the all round performance of virtual machines, especially when they are doing sequential I/O.

This then begs the question – how do we figure out if the next I/O is sequential or not? That is what the next setting gives us.

Disk.SectorMaxDiff
As stated, we need a figure of ‘proximity’ to see if the next I/O of a Virtual Machine is ‘sequential’. If it is, then we give the virtual machine the benefit of getting the next I/O slot as it will likely be served faster by the storage. If it is outside this proximity, the I/O goes to the next Virtual Machine for fairness. The setting represents the maximum distance in disk sectors when considering if two I/O’s are indeed “sequential” in proximity. Disk.SectorMaxDiff defaults to 2000.

Disk.SchedQControlVMSwitches
This advanced setting is used to determine when to throttle down the amount of I/Os sent by one virtual machine to the driver queue to allow other virtual machines to schedule their I/O. The setting refers to the number of times we switch between different virtual machines. If we switch between unique virtual machines this many times, then we will start to reduce the maximum number of commands that a virtual machine can queue to Disk.SchedNumReqOutstanding. By default, the number of switches between virtual machines to trigger this behavior is 6.  What this means is if we encounter a period of high I/O and switch between 6 unique virtual machines, and we have not returned to service the I/Os outstanding on any of the previous virtual machines, we will automatically throttle back the number of I/Os which a virtual machine can schedule. This again is to balance fairness versus performance.

Disk.SchedQControlSeqReqs
Once the period of high I/O has passed, this advanced parameter is used to determine when to throttle back up to the full number of scheduled I/Os per virtual machine to the maximum value. The setting refers to the number of times we issue I/O’s from the same Virtual Machine before we go back to using the full number of scheduled requests. The default value for this setting is 128 scheduled I/Os. This means that a virtual machine is scheduled 4 consecutive times without interruption (4 x 32). If the device queue depth was set to 64, then this will allow the virtual machine to issue 64 I/Os at a time since clearly there are no other virtual machines issuing I/O at this time.

Hopefully this gives you an idea around how there are various checks and balances in the VMkernel to allow for performance and fairness when many virtual machines are sharing the same storage device/LUN.

I will finish with one final warning. I already highlighted that these parameters shouldn’t be changed unless you are under some specific guidance from our support staff. Another reason for not touching these parameters is that they are global. If you  want to tune the I/O behavior for a single LUN, you cannot do it. You will impact all of your datastores, and if you have multiple arrays, all datastores across all arrays. Be very careful!

Get notification of these blogs postings and more VMware Storage information by following me on Twitter: @VMwareStorage