
By Nathan Small
When a thin-provisioned LUN formatted with VMFS runs out of space on the storage array, there can be some some undesired consequences on the VMs running that are requesting to write additional blocks. In ESX versions 4.x, the result could cause VMs to crash. In rare cases, data corruption could occur inside the Guest OS’s file system as the VMs and ESX host are completely unaware that they are on a thinly provisioned LUN that has suddenly run out of space. Corruption of a VM’s running snapshot has also been observed with this condition. It can also cause replaying of the VMFS journal to fail:
|
1 2 3 4 5 6 7 8 9 10 11 12 |
2012-12-12T18:59:53.837Z cpu4:2074)J3: 3497: Replaying transaction failed: No space left on device 2012-12-12T18:59:53.837Z cpu4:2074)J3: 3510: checksum 7522aca58a6aab5, length 5632, CID 0xc1d00001, hbGen 29, ser# 219829 2012-12-12T18:59:53.837Z cpu4:2074)J3: 3512: 2 lockActions, first at 48 2012-12-12T18:59:53.837Z cpu4:2074)J3: 3514: 4 logActions, first at 848 2012-12-12T18:59:53.837Z cpu4:2074)J3: 3520: Locks 2012-12-12T18:59:53.837Z cpu4:2074)J3: 3524: #0: 20277248 v 203 2012-12-12T18:59:53.837Z cpu4:2074)J3: 3524: #1: 4276224 v 64 2012-12-12T18:59:53.837Z cpu4:2074)J3: 3526: Actions 2012-12-12T18:59:53.837Z cpu4:2074)J3: 3530: #0: type 1 reqLk 1, fr 0 to 4276736, len 512 2012-12-12T18:59:53.837Z cpu4:2074)J3: 3530: #1: type 1 reqLk 0, fr 0 to 23592960, len 2048 2012-12-12T18:59:53.837Z cpu4:2074)J3: 3530: #2: type 1 reqLk 0, fr 0 to 583150592, len 512 2012-12-12T18:59:53.837Z cpu4:2074)J3: 3530: #3: type 1 reqLk 0, fr 0 to 20277760, len 1536 |
Notice the “No space left on device” status. This same status can be observed for other operations when the thin LUN is out of space.
In ESX 5.0, we introduced new VAAI features specifically to handle out-of-space conditions on thinly provisioned LUNs. (Note: The array must support this VAAI feature.) Through the VAAI-TP feature, we are notified about space utilization and consumption for thinly provisioned LUNs – areas that were previously opaque to us. In the event that a LUN begins to run out of space, we receive advanced notification from the array. We transfer this into an alert that is visible from vCenter:
If the LUN completely runs out of blocks, another alert is thrown into vCenter. We will also pause any VMs that attempt to write to additional blocks during this condition, to ensure data integrity for the file system within the OS as well as the integrity of running snapshots for the VM:
VMs that are not requesting to write to space that is not currently available will continue to run and will not be affected.
Using thin technology allows you to maximize your storage capacity and usage; however, if not monitored properly, you can put your production at risk. Utilizing the VAAI features of 5.x with your storage array will give you visibility into out-of-space conditions and allow you to take action before production is affected.

