Product Announcements

Advanced VMkernel Settings for Disk Storage

As regular readers will know by now, many of these blog posts are a result of internal discussions held between myself and other VMware folks (or indeed storage partners). This one is no different. I was recently involved in a discussion about how VMs did sequential I/O, which led me to point out a number of VMkernel parameters related to performance vs fairness for VM I/O. In fact, I have seen other postings about these parameters, but I realised that I never did post anything myself. 

A word of caution! These parameters have already been fine tuned by VMware. There should be no need to modify these parameters. If you do, you risk impacting your own environment. As mentioned, this is all about performance vs fairness. Tuning these values can give you some very fast VMs but can also give you some very slow ones. You've been warned.

Disk.SchedNumReqOutstanding
This is the maximum number of I/Os one VM can issue all the way down to the LUN when there is more than one VM pushing I/O to the same LUN – the default was 16 in pre ESX 3.5. This was bumped to 32 in ESX 3.5, and remains at 32 today.

Disk.SchedQuantum
The maximum number of consecutive “sequential” I/O’s allowed from one VM before we force a switch to another VM (unless this is the only VM on the LUN). Disk.SchedQuantum is set to a default value of 8.
But how do we figure out if the next I/O is sequential or not? That's a good question.

Disk.SectorMaxDiff
As mentioned, we need a figure of ‘proximity’ to see if the next I/O of a VM is ‘sequential’. If it is, then we give the VM the benefit of getting the next I/O slot as it will likely be served faster by the storage. If it is outside this proximity, then the I/O goes to the next VM for fairness. This value is the maximum distance in disk sectors when considering if two I/Os are “sequential”. Disk.SectorMaxDiff defaults to 2000 sectors.

Disk.SchedQControlVMSwitches
This value is used to determine when to throttle down the amount of I/Os sent by one VM to the queue. It refers to the number of times we switch between VMs to handle I/O – if we switch this many times, then we reduce the maximum number of commands that can be queued. The default is 6 switches.

Disk.SchedQControlSeqReqs
This is used to determine when to throttle back up to the full queue depth. It refers to the number of times we issue I/O’s from the same VM before we go back to using the full LUN queue depth. The default is 128. In other words, if the same VM issues 128 I/Os without any other VM wishing to issue I/Os in the same timeframe, we throttle the number of I/Os per VM back to its maximum.

While researching for this post, I came across a bunch of other advanced disk parameters in my notes which I though you might like to know about.

Disk.PathEvalTime
Amount of time to wait before checking status of failed path. The default is 300 seconds (5 minutes). This means that if you have a preferred path (fixed path policy) and you have failed over to an alternate path, every 300 seconds the VMkernel will issue a TUR (Test Unit Ready) SCSI command to see if the preferred path has come back online. When it does, I/O will be moved back to the preferred path.

Disk.SupportSparseLUN
Wow – this setting brings me back. Let's say that the SAN administrator presented LUN 0,1,2 & 4,5,6 to your ESXi host. If Disk.SupportSparseLUN is turned off, when we found the gap in LUNs, we wouldn't find any LUNs beyond this point. Having Disk.SupportSparseLUN enabled (which it is by default) means that we can traverse these gaps in LUNs. I'm pretty sure this is only relevant to the SCSI Bus Walking discovery method – see the next advanced setting.

Disk.UseReportLUN
The storage stack uses the SCSI REPORT_LUNS command to detect LUNs on a target. The SCSI REPORT LUNS command requests a target to return a logical unit inventory (LUN list) to the initiator rather than querying each LUN individually, i.e. SCSI Bus Walking. The option is enabled by default. Believe me, you do not want to use SCSI bus walking unless you get a kick out of having a really slow ESXi boot time.

Disk.UseDeviceReset & Disk.UseLUNReset
These two parameters, taken together, determine the type of SCSI reset. The following table shows the available types:

Reset-table
*The default is LUN Reset.

Get notification of these blogs postings and more VMware Storage information by following me on Twitter: Twitter @VMwareStorage