From the Trenches

Out-of-space conditions for thin-provisioned array LUNs

Nathan Small
Nathan Small

By Nathan Small

When a thin-provisioned LUN formatted with VMFS runs out of space on the storage array, there can be some some undesired consequences on the VMs running that are requesting to write additional blocks.  In ESX versions 4.x, the result could cause VMs to crash. In rare cases, data corruption could occur inside the Guest OS’s file system as the VMs and ESX host are completely unaware that they are on a thinly provisioned LUN that has suddenly run out of space. Corruption of a VM’s running snapshot has also been observed with this condition. It can also cause replaying of the VMFS journal to fail:

 

 

 

Notice the “No space left on device” status. This same status can be observed for other operations when the thin LUN is out of space.

In ESX 5.0, we introduced new VAAI features specifically to handle out-of-space conditions on thinly provisioned LUNs. (Note: The array must support this VAAI feature.) Through the VAAI-TP feature, we are notified about space utilization and consumption for thinly provisioned LUNs – areas that were previously opaque to us. In the event that a LUN begins to run out of space, we receive advanced notification from the array. We transfer this into an alert that is visible from vCenter:

If the LUN completely runs out of blocks, another alert is thrown into vCenter. We will also pause any VMs that attempt to write to additional blocks during this condition, to ensure data integrity for the file system within the OS as well as the integrity of running snapshots for the VM:

VMs that are not requesting to write to space that is not currently available will continue to run and will not be affected.

Using thin technology allows you to maximize your storage capacity and usage; however, if not monitored properly, you can put your production at risk. Utilizing the VAAI features of 5.x with your storage array will give you visibility into out-of-space conditions and allow you to take action before production is affected.