Product Announcements

VAAI Offload Failures & the role of the VMKernel Data Mover

Before VMware introduced VAAI (vSphere Storage APIs for Array Integration), migrations of Virtual Machines (and their associated disks) between datastores was done by the VMkernel Data Mover (DM).

The Data Mover aims to have a continuous queue of outstanding IO requests to achieve maximum throughput. Incoming I/O requests to the Data Mover are divided up into smaller chunks. Asynchronous I/Os are then simultaneously issued for each chunk until the DM queue depth is filled. When a request completes, the next request is issued. This could be for writing the data that was just read, or to handle the next chunk.

Take the example of a clone of a 64GB VMDK (Virtual Machine Disk file). The DM is asked to move the data in 32MB transfers. The 32MB is then transferred in "PARALLEL" as a single delivery, but is divided up into a much smaller I/O size of 64KB by the DM, using 32 threads at a time. To transfer this 32MB, a total of 512 I/Os of size 64KB is issued by the DM.

By comparison, a similar a 32MB transfer via VAAI issues a total of 8 I/Os of size 4MB (XCOPY uses 4MB transfer sizes). The advantages of VAAI in terms of ESXi resources is immediately apparent. 

The decision to transfer using the DM or offloading to the array with VAAI is taken upfront by looking at storage array Hardware Acceleration state. If we decide to transfer using VAAI and then encounter a failure with the offload, the VMkernel will try to complete the transfer using the VMkernel DM. It should be noted that the operation is not restarted; rather it picks up from where the previous transfer left off as we do not want to abandon what could possibly be very many GB worth of copied data because of a single transient transfer error.

If the error is transient, we want the VMkernel to check if it is ok to start offloading once again. In vSphere 4.1, the frequency at which an ESXi host checks to see if Hardware Acceleration is supported on the storage array is defined via the following parameter:

 # esxcfg-advcfg -g /DataMover/HardwareAcceleratedMoveFrequency
Value of HardwareAcceleratedMoveFrequency is 16384

This parameter dictates how often we will retry an offload primitive once a failure is encountered. This can be read as 16384 * 32MB I/Os, so basically we will check once every 512GB of data move requests. This means that if at initial deployment, an array does not support the offload primitives, but at a later date the firmware on the arrays gets upgraded and the offload primitives are now supported, nothing will need to be done at the ESXi side – it will automatically start to use the offload primitive.

HardwareAcceleratedMoveFrequency only exists in vSphere 4.1. In vSphere 5.0 and later, we replaced it with the periodic VAAI state evaluation every 5 minutes.

Get notification of these blogs postings and more VMware Storage information by following me on Twitter: Twitter @VMwareStorage