Posted by Cormac Hogan
Technical Marketing Architect (Storage)
I have done a number of blog posts in the recent past related to our newest VAAI primitive UNMAP. For those who do not know, VAAI UNMAP was introduced in vSphere 5.0 to allow the ESXi host to inform the storage array that files or VMs had be moved or deleted from a Thin Provisioned VMFS datastore. This allowed the array to reclaim the freed blocks. We had no way of doing this previously, so many customers ended up with a considerable amount of stranded space on their Thin Provisioned VMFS datastores.
Now there were some issues with using this primitive which meant we had to disable it for a while. Fortunately, 5.0 U1 brought forward some enhancements which allows us to use this feature once again.
Over the past couple of days, my good friend Paudie O'Riordan from GSS has been doing some testing with the VAAI UNMAP primitive against our NetApp array. He kindly shared the results with me, so that I can share them with you. The posting is rather long, but the information contained will be quite useful if you are considering implementing dead space reclamation.
Some details about the environment which we used for this post:
- NetApp FAS 3170A
- ONTAP version 8.0.2 (I believe earlier versions do not support UNMAP)
- ESXi version 5.0U1, build 623860,
Step 1 - Verify that your storage array is capable of processing the SCSI UNMAP commands. The first place to look is on the vSphere Client UI. Select the datastore and examine the 'Hardware Acceleration' details (Hardware Acceleration is how we refer to VAAI in the vSphere UI):
Step 2 - The Hardware Acceleration status states Supported so it looks like this array is VAAI capable. The issue now is that we don't know exactly which primitives are supported so we need to run an esxcli command to determine this. First, you need to get the NAA id of the device backing your datastore. One way of doing this is to use the CLI command 'esxcli storage vmfs extent list' on the ESXi host. In our setup, this command returned the following NAA id for the LUN backing our VMFS-5 datastore:
naa.60a98000572d54724a346a6170627a52
Once the NAA id has been identified, we can now go ahead and display device specific details around Thin Provisioning and VAAI. To do that, we use another esxcli command 'esxcli storage core device list –d <naa>'. This command can show us information such as firmware revision, thin provisioning status, the VAAI filter and the VAAI status:
# esxcli storage core device list –d naa.60a98000572d54724a346a6170627a52
naa.60a98000572d54724a346a6170627a52
Display Name: NETAPP Fibre Channel Disk (naa.60a98000572d54724a346a6170627a52)
Has Settable Display Name: true
Size: 51200
Device Type: Direct-Access
Multipath Plugin: NMP
Devfs Path: /vmfs/devices/disks/naa.60a98000572d54724a346a6170627a52
Vendor: NETAPP
Model: LUN
Revision: 8020
SCSI Level: 4
Is Pseudo: false
Status: on
Is RDM Capable: true
Is Local: false
Is Removable: false
Is SSD: false
Is Offline: false
Is Perennially Reserved: false
Thin Provisioning Status: yes
Attached Filters: VAAI_FILTER
VAAI Status: supported
Other UIDs: vml.020033000060a98000572d54724a346a6170627a524c554e202020
Here we see that the device is indeed Thin Provisioned and supports VAAI. Now we can run a command to display the VAAI primitives supported by the array for that device. In particular we are interested in knowing whether the array supports the UNMAP primitive for dead space reclamation (what we refer to as the Delete Status). Another esxcli command is used for this step – 'esxcli storage core device vaai status get -d <naa>':
naa.60a98000572d54724a346a6170627a52
VAAI Plugin Name: VMW_VAAIP_NETAPP
ATS Status: supported
Clone Status: supported
Zero Status: supported
Delete Status: supported
The device displays Delete Status as supported meaning that it is capable of sending SCSI UNMAP commands to the array when a space reclaim operation is requested.
Great – so we have now confirmed that we have a storage array that is capable of dead space reclamation.
Step 3 – Let's take a closer look at the datastore next. As can be seen from the screen-shot above, this is a 50GB LUN formatted with a VMFS-5. There is 49.5GB usable space remaining. Next, we deployed a Virtual Machine with a 15GB VMDK to this datastore. The Guest OS is using around 8.82GB of this space since that VMDK is thin provisioned. Here is a look at the provisioned and used space from a VMDK perspective:
To look at more granular information about the amount of space consumed on the VMFS-5 volume, we can use some CLI commands. The recommendation would be to use vmkfstools -P to get the detailed volume information:
# vmkfstools -Ph -v 1 /vmfs/volumes/source-datastore/
File system label (if any): source-datastore
Mode: public ATS-only
Capacity 49.8 GB, 40.0 GB available, file block size 1 MB
Volume Creation Time: Tue Apr 24 14:20:51 2012
Files (max/free): 130000/129975
Ptr Blocks (max/free): 64512/64483
Sub Blocks (max/free): 32000/31998
Secondary Ptr Blocks (max/free): 256/256
File Blocks (overcommit/used/overcommit %): 0/10006/0
Ptr Blocks (overcommit/used/overcommit %): 0/29/0
Sub Blocks (overcommit/used/overcommit %): 0/2/0
UUID: 4f96b6c3-dcc7c210-a943-001b219b5078
Partitions spanned (on "lvm"):
naa.60a98000572d54724a346a6170627a52:1
DISKLIB-LIB : Getting VAAI support status for /vmfs/volumes/source-datastore/
Is Native Snapshot Capable: NO
We can clearly see that 10006 x 1MB File Blocks consumed on the VMFS-5 volume. This is approximately 9.77GB. The next thing we have to take into account is the amount of VMFS-5 volume that is consumed by VMFS metadata. The best way to get an approximation of this overhead is to use the du -h command on the datastore:
# du -h /vmfs/volumes/source-datastore/
8.8G /vmfs/volumes/source-datastore/WindowsVM
9.6G /vmfs/volumes/source-datastore
By taking away the amount of VMFS-5 volume consumed by Virtual Machines and related files (8.8GB) from the amount of space consumed on the complete volume (9.6GB), we can deduce that approximately 800MB is given over to VMFS-5 metadata. OK, now that we know what is consuming space on our volume, we are finally ready to start looking at the UNMAP primitive in action.
Step 4 - Let's do a Storage vMotion operation next and move this Virtual Machine from our source datastore to a different datastore. This is probably the best use-case for the UNMAP primitive. Once the Storage vMotion operation has completed, the vSphere client will report that the VMFS-5 volume now has a lot more free space:
Step 5 - The issue however is that when we check the amount of free space on the Thin Provisioned LUN backing this VMFS-5 volume on the storage array, we see that we still have unused and stranded space. Using a 'lun show' CLI command on this NetApp array which is hosting the LUN for our VMFS-5 volume, we see that 8.8GB of space is still consumed:
lun show -v /vol/vol2/thin-lun
/vol/vol2/thin-lun 50g (53687091200) (r/w, online, mapped)
Serial#: W-TrJ4japbzR
Share: none
Space Reservation: disabled
Multiprotocol Type: vmware
Maps: unmap=51 issi=51
Occupied Size: 8.8g (9473908736)
Creation Time: Tue Apr 24 15:16:52 BST 2012
Cluster Shared Volume Information: 0×0
This is the crux of the issue that we are trying to solve with the VAAI UNMAP primitive.
Step 6 - We finally get to the point where we can now use the SCSI UNMAP primitive. If you've been following my blog posts, you'll know that we can now reclaim this stale and stranded space using the vmkfstools command.
Caution – We expect customers to use this primitive during their maintenance window, since running it on a datastore that is in-use by a VM can adversely affect I/O for the VM. I/O can take longer to complete, resulting in lower I/O throughput and higher I/O latency.
A point I would like to emphasize is that the whole UNMAP performance is totally driven by the storage array. Even the recommendation that vmkfstools -y be issued in a maintenance window is mostly based on the effect of UNMAP commands on the array's handling of other commands.
There is no way of knowing how long an UNMAP operation will take to complete. It can be anywhere from few minutes to couple of hours depending on the size of the datastore, the amount of content that needs to be reclaimed and how well the storage array can handle the UNMAP operation.
To run the command, you should change directory to the root of the VMFS volume that you wish reclaim space from. The command is run as:
vmkfstools –y <% of free space to unmap>
The % value provided is then used to calculate the amount of stranded space that should be reclaimed from the VMFS volume as follows:
<amount of space to be unmapped> = (parameter passed to vmkfstools –y * free space on vmfs volume) / 100
We will see an actual example of this command being run shortly.
Step 7 - You can verify if the UNMAP primitives are being issued by using esxtop. Press ‘u’ to get into the disk device view. then press ‘f’, ‘o’ & ’p’ to select display “VAAISTATS” and “VAAILATSTATS/cmd” fields. The values under “DELETE”, “DELETE_F” & “MBDEL/s” columns are the ones of interest during a space reclaim operation:
In this example, we attempted a reclaim of 60% of free space. The vmkfstools -y command displays the following:
Attempting to reclaim 60% of free capacity 48.8 GB (29.3 GB) on VMFS-5 file system 'source-datastore' with max file size 64 TB.
Create file .vmfsBalloontsWt8w of size 29.3 GB to reclaim free blocks.
Done.
vmkfstools -y created a balloon file of 29.3GB which is 60% of the free capacity (48.8GB). This temporary “balloon file” is equal to the size of the space to be unmapped/reclaimed.
There is a note of caution here – if you specify a % value in the high 90s or 100, the temporary "balloon" file which is created during the reclaim operation may fill up the VMFS volume. Any growth of current VMDK files or the creation of new files, such as snapshots, may fail due to unavailable space. Care should be taken when calculating the amount of free space to reclaim.
If we now look at esxtop while the reclaim is going on:
From above output we see some UNMAP commands have been issued. By viewing the DELETE and the MBDEL/s columns, we can see the rate at which the commands are being processed. If you see values incrementing in the DELETE_F column, then that means some UNMAP commands may have failed.
Step 8 - Finally, if we return to our storage array and query the status of the Thin Provisioned LUN, we should now see a difference in the occupied space:
lun show -v /vol/vol2/thin-lun
/vol/vol2/thin-lun 50g (53687091200) (r/w, online, mapped)
Serial#: W-TrJ4japbzR
Share: none
Space Reservation: disabled
Multiprotocol Type: vmware
Maps: unmap=51 issi=51
Occupied Size: 76.3m (79966208)
Creation Time: Tue Apr 24 15:16:52 BST 2012
Cluster Shared Volume Information: 0×0
And there we have it. A real life example of the SCSI UNMAP primitive reclaiming dead space from a Thin Provisioned LUN backing a VMFS-5 datastore.
You should also note that in 5.0 U1, even if the advanced option to issue SCSI UNMAP when deleting a VMDK or doing a Storage vMotion is enabled (/VMFS3/EnableBlockDelete), it will no longer do so. The only way to reclaim stranded space in 5.0U1 is via vmkfstools.
Once again, thanks to Paudie for putting this together, and also to Luke Reed and our other friends at NetApp for both the equipment and assistance with getting it updated to a version of ONTAP which supports the UNMAP primitive.
Get notification of these blogs postings and more VMware Storage information by following me on Twitter: @VMwareStorage

It would almost be easier to delete the Datastore?
This really needs to be an automated process. How many people are really going to do this?
Why isn’t the UNMAP automatic? Is it performance related? Is there any harm in deleting an empty volume that shows currently provisioned space?
Why is this a synchronous process when it should be Asynchronous? Also, does it also delete the space on storage if you are just deleting a VM off of the Datastore? How does it handle that?
Anon,
Yes, it should be an automated process and it sdid start off like that. Unfortunately, as Matt alluded to, there were some performance issues experienced which meant that we had to make it manual.
Matt,
I’m guessing that if you had an empty volume, then you could delete it, but it is not always going to be possible to do this. This procedure will allow you to reclaim stranded space on those volumes.
Hi can this be run from a ESXi 5U1 box with a VMFS 4 datastore?
I’m guessing you mean VMFS-3 here Rob. We never had a VMFS-4
Yes, my understanding is that it is ok to run the vmkfstools reclaim command against a VMFS-3 datastore. The VMFS drivers which ship with 5.0 and later have the necessary VAAI TP extension code.
You can verify that the UNMAPS are occurring with esxtop by looking at the DELETE field in the VAAI stats view: esxtop -> u (device view) -> f (select fields) -> O (VAAI Stats)
Thanks,
yes I meant VMFS 3 we are running 3.46. Tested today and it works.
Thanks again will look at scripting this now
Rob
Thanks,
yes I meant VMFS 3 we are running 3.46. Tested today and it works.
Thanks again will look at scripting this now
Rob
Does anyone know if unmap will be supported without running the vmkfs-tool -y command? ie..when I delete a VM (or Storage vMotion), unmap will automatically be called?
Hi Ken,
We are actively working on a solution, but I do not have any timeframes that I can share at this time. As soon as I know more, I will share with the storage community.
How do you determine the value of “parameter passed to vmkfstools –y”?
THanks
Look at the free space on a volume and then estimate how much of that free space is unclaimed at the array side. If you have access to the array, then you should be able to tell how much is dead space. The % represents the dead space which vSphere is reporting as free space. In the above example, vSphere reported 48GB. We looked at the array and saw that 9GB was still consumed. This is about 20%. However, if you are unable to look on the array, you can put in a guestimate value – in this case we just used 60%.
Thanks for clarifying. This is a big help. Of course EMC is fixing this issue in the next enginuity update in Q4.
We have an HP P4500 with a volume Thin provisioned, which has run out of space. The VMs say we are using only 40% of the volume, but the Management Console says we are out of space. HP support says we need to mode the data off, delete the volume, recreate and move the data back. We are doing that now using Storage vMotion.
Am I to understand that this will not reclaim the blocks? ie using Storage vMotion to move off of the HP, delete the volume and recreate, and Storage vMotion back?
Does anyone know if this will work on the HP P4500 Lefthand?
Thanks for the usefull information provided in this Blog,
Would you be able to answer if Datacore fully supports the VAAI Block Zero function for thin provisioned volumes on SANsymphony-V whether manually or call through API.
I see this supported as reported however does not seem to completely clean up (zero) and this as a result failing to perform the UNMAP operations.
I’d suggest speaking to one of our support representatives to figure out why this is happening. Support have the ability to work with 3rd party vendors like Datacore.
i run the command, but no .vmfsBallonxyz file is created – at least none is shown when immediately running an ls on the volume. The DELETE columns under esxtop show as 0 when I do this too. We’re running EqualLogic (5.2.6), which does support SCSI UNMAP. I’ve done this with another client worked a treat, however doing for this particular client its not working. Space remains the same as well. There is definitely space to recover. Any ideas???
Hi Jimmy,
Are you using 5.0 patch 2 by any chance. The reason I ask is because UNMAP was disabled in that release. You’ll have to upgrade to 5.0U1 to get the functionality again. Also verify using the ESXCLI that the device supports the VAAI UNMAP (DELETE) primitive. If neither of these are the issue, I recommend speaking to our support specialists who can help troubleshoot the issue.
HTH
Cormac
Hi,
I just tested the ‘automatic’ UNMAP on a (upgraded) vSphere 5.1.0 cluster. But when I storage migrate a VM from one DS to another, and while it’s migrating I’m running esxtop on the host which is running the VM, I don’t see the DELETE counter increasing while the migration is completing. So it seems vSphere is not signaling our storage box with UNMAP? Any ideas? I verified the advanced settings of the host (VMFS3.EnableBlockDelete = 1) and our storage backend is also supporting it.
Note: When I manually run vmkfstools, it does issue UNMAP commands to the backend. (I see the DELETE counter increasing).
Any help is appreciated.
There is still no automated UNMAP, not even in 5.1. The only way to reclaim dead space is via vmkfstools -y. We hope to bring the automated UNMAP back in a future release, but there are no timelines to share at this time.
Hi,
Someone knows if VAAI Unmap command can be executed on ESXi 5 standard Edition (Not Enterprise / Enterprise+ Edition) ?
Any help is appreciated.
Regards,
Thibaud
VAAI isn’t tied to any one edition to the best of my knowledge.
Therefore UNMAP should work with all 5.x versions.
Hi,
vSphere enterprise edition has a feature named ‘[Storage APIs for Array Integration ]‘. I don’t understand if the feature is the VAAI here, or not. could please clarify this ? thank you.
Yes, you have VAAI with Enterprise & Enterprise+. I found these listed here – http://www.vmware.com/products/datacenter-virtualization/vsphere/compare-editions.html
Does this mean it is only available in Enterprise/Enterprise+? I’ve been trying for days to make this work on Essentials Plus without success.
Hello Cormac,
Your post is very informative! Thanks.
I see you having lots of questions related to why the UNMAP command is not an automated process in vSphere 5.x (as VMware originally planned to) or why there is no checkbox available with VM delete/move operations for that (which might be a good idea anyway).
I am dealing with storage arrays and SANs for more than a decade and the answer seems to be that it is not really something VMware itself can do a lot about it. Most of the reclamation process is done on the storage array and the storage array vendors implement this crucial part of the code. Then testing it with VMware.
UNMAP can be done with full speed and interfere with all other workloads. This was already seen by most of VAAI users on ESXi 5.0. Possibly the UNMAP should be throttled on the array to limit the impact, but then it would take longer. Throttling is something most of the array vendors can do.
Have you also considered vmkfstools -y command to warn users, who put the percantage values higher than lets say 90%, may cause troubles (out-of-space condition)? The warning and additional confirmation required by vmkfstools would be nice to have. When I saw vmkfstools -y option for the first time my thought was, why not to put 100% and make the whole free space reclaimed? The crux here is the baloon file created by vmkfstools, which is a tricky way to put zeros on the stranded space. Am I right?
Regards to all storage and vSphere geeks
Thanks for the post. You are right, and we recognize that using vmkfstools is rather cumbersome and relies on the customer having to make some calculations around how much space can be freed. And yes, if run incorrectly, the balloon file can cause some issues. All I can say at this time is that we have taken note of that and we are working towards making this operation more simplified. I’ll share more details with you as soon as I can.
Excellent blog right here! Also your web site so
much up fast! What host are you using? Can I get your affiliate link to your host?
I desire my site loaded up as fast as yours lol
Hi Cormac,
Thank you for your useful informations.
I hv one question. Pls tell me.
An several ESX hosts assigned one volume.(shared volume)
Then, do I do the command against all ESX hosts? or only one ESX host?
If this is OK for only one ESX host, how does the other hosts recognize about after reclaimed volume size? I just do the rescan action?
Thanks in Advance
makoto
Yes – you only need to do it on one host. I believe a refresh operation should be sufficient to update the free space. Try that before a full rescan.
Thanks Cormac,
Very helpful for me!!
Still waiting on an automated or command-based manner to shrink a thin VMDK for a guest who’s usage had bloated and was later shrunk. Only way to do that currently is to move the guest between two LUNs of different block size, which is no longer possible if you use VMFS 5 exclusively thanks to the removal of the ability to set the block size.
I’d much rather use thin VMDK’s than thin LUNs and hope that different LUNs don’t balloon at the same time; that’s far more dangerous than a guest not being able to grow.
great article Cormac. thanks for sharing. we are now able to reclaim free space from hypervisor. it will be very useful especially out of space issue costumers.