posted

67 Comments

Cormac_Hogan
Posted by Cormac Hogan
Technical Marketing Architect (Storage)

I have done a number of blog posts in the recent past related to our newest VAAI primitive UNMAP. For those who do not know, VAAI UNMAP was introduced in vSphere 5.0 to allow the ESXi host to inform the storage array that files or VMs had be moved or deleted from a Thin Provisioned VMFS datastore. This allowed the array to reclaim the freed blocks. We had no way of doing this previously, so many customers ended up with a considerable amount of stranded space on their Thin Provisioned VMFS datastores.

Now there were some issues with using this primitive which meant we had to disable it for a while. Fortunately, 5.0 U1 brought forward some enhancements which allows us to use this feature once again.

Over the past couple of days, my good friend Paudie O'Riordan from GSS has been doing some testing with the VAAI UNMAP primitive against our NetApp array. He kindly shared the results with me, so that I can share them with you. The posting is rather long, but the information contained will be quite useful if you are considering implementing dead space reclamation.

Some details about the environment which we used for this post:

  • NetApp FAS 3170A
  • ONTAP version 8.0.2 (I believe earlier versions do not support UNMAP)
  • ESXi version 5.0U1, build 623860,

 Step 1 – Verify that your storage array is capable of processing the SCSI UNMAP commands. The first place to look is on the vSphere Client UI. Select the datastore and examine the 'Hardware Acceleration' details (Hardware Acceleration is how we refer to VAAI in the vSphere UI):

VC-UI
Step 2 – The Hardware Acceleration status states Supported so it looks like this array is VAAI capable. The issue now is that we don't know exactly which primitives are supported so we need to run an esxcli command to determine this.  First, you need to get the NAA id of the device backing your datastore. One way of doing this is to use the CLI command 'esxcli  storage vmfs extent list' on the ESXi host. In our setup, this command returned the following NAA id for the LUN backing our VMFS-5 datastore:

naa.60a98000572d54724a346a6170627a52

Once the NAA id has been identified, we can now go ahead and display device specific details around Thin Provisioning and VAAI. To do that, we use another esxcli command 'esxcli storage core device list –d  <naa>'. This command can show us information such as firmware revision, thin provisioning status, the VAAI filter and the VAAI status:

# esxcli storage core device list –d  naa.60a98000572d54724a346a6170627a52
naa.60a98000572d54724a346a6170627a52
   Display Name: NETAPP Fibre Channel Disk (naa.60a98000572d54724a346a6170627a52)
   Has Settable Display Name: true
   Size: 51200
   Device Type: Direct-Access
   Multipath Plugin: NMP
   Devfs Path: /vmfs/devices/disks/naa.60a98000572d54724a346a6170627a52
   Vendor: NETAPP
   Model: LUN
   Revision: 8020
   SCSI Level: 4
   Is Pseudo: false
   Status: on
   Is RDM Capable: true
   Is Local: false
   Is Removable: false
   Is SSD: false
   Is Offline: false
   Is Perennially Reserved: false
   Thin Provisioning Status: yes
   Attached Filters: VAAI_FILTER
   VAAI Status: supported
   Other UIDs: vml.020033000060a98000572d54724a346a6170627a524c554e202020

Here we see that the device is indeed Thin Provisioned and supports VAAI. Now we can run a command to display the VAAI primitives supported by the array for that device. In particular we are interested in knowing whether the array supports the UNMAP primitive for dead space reclamation (what we refer to as the Delete Status). Another esxcli command is used for this step – 'esxcli storage core  device vaai status get -d  <naa>':

naa.60a98000572d54724a346a6170627a52
       VAAI Plugin Name: VMW_VAAIP_NETAPP
       ATS Status: supported
       Clone Status: supported
       Zero Status: supported
       Delete Status: supported

The device displays Delete Status as supported meaning that it is capable of sending SCSI UNMAP commands to the array when a space reclaim operation is requested.

Great – so we have now confirmed that we have a storage array that is capable of dead space reclamation.

Step 3 – Let's take a closer look at the datastore next. As can be seen from the screen-shot above, this is a 50GB LUN formatted with a VMFS-5. There is 49.5GB usable space remaining. Next, we deployed a Virtual Machine with a 15GB VMDK to this datastore. The Guest OS is using around 8.82GB of this space since that VMDK is thin provisioned. Here is a look at the provisioned and used space from a VMDK perspective:

VMFS-5
To look at more granular information about the amount of space consumed on the VMFS-5 volume, we can use some CLI commands. The recommendation would be to use vmkfstools -P to get the detailed volume information:

# vmkfstools -Ph -v 1 /vmfs/volumes/source-datastore/
File system label (if any): source-datastore
Mode: public ATS-only
Capacity 49.8 GB, 40.0 GB available, file block size 1 MB
Volume Creation Time: Tue Apr 24 14:20:51 2012
Files (max/free): 130000/129975
Ptr Blocks (max/free): 64512/64483
Sub Blocks (max/free): 32000/31998
Secondary Ptr Blocks (max/free): 256/256
File Blocks (overcommit/used/overcommit %): 0/10006/0
Ptr Blocks  (overcommit/used/overcommit %): 0/29/0
Sub Blocks  (overcommit/used/overcommit %): 0/2/0
UUID: 4f96b6c3-dcc7c210-a943-001b219b5078
Partitions spanned (on "lvm"):
        naa.60a98000572d54724a346a6170627a52:1
DISKLIB-LIB   : Getting VAAI support status for /vmfs/volumes/source-datastore/
Is Native Snapshot Capable: NO

We can clearly see that 10006 x 1MB File Blocks consumed on the VMFS-5 volume. This is approximately 9.77GB. The next thing we have to take into account is the amount of VMFS-5 volume that is consumed by VMFS metadata. The best way to get an approximation of this overhead is to use the du -h command on the datastore:

# du -h /vmfs/volumes/source-datastore/
8.8G    /vmfs/volumes/source-datastore/WindowsVM
9.6G    /vmfs/volumes/source-datastore

By taking away the amount of VMFS-5 volume consumed by Virtual Machines and related files (8.8GB) from the amount of space consumed on the complete volume (9.6GB), we can deduce that approximately 800MB is given over to VMFS-5 metadata. OK, now that we know what is consuming space on our volume, we are finally ready to start looking at the UNMAP primitive in action.

Step 4 – Let's do a Storage vMotion operation next and move this Virtual Machine from our source datastore to a different datastore. This is probably the best use-case for the UNMAP primitive. Once the Storage vMotion operation has completed, the vSphere client will report that the VMFS-5 volume now has a lot more free space:

VMFS-5-after-StoragevMotion

Step 5 – The issue however is that when we check the amount of free space on the Thin Provisioned LUN backing this VMFS-5 volume on the storage array, we see that we still have unused and stranded space. Using a 'lun show' CLI command on this NetApp array which is hosting the LUN for our VMFS-5 volume, we see that 8.8GB of space is still consumed:

lun show -v /vol/vol2/thin-lun
/vol/vol2/thin-lun            50g (53687091200)   (r/w, online, mapped)
        Serial#: W-TrJ4japbzR
        Share: none
        Space Reservation: disabled
        Multiprotocol Type: vmware
        Maps: unmap=51 issi=51
        Occupied Size:    8.8g (9473908736)  
        Creation Time: Tue Apr 24 15:16:52 BST 2012
        Cluster Shared Volume Information: 0x0

This is the crux of the issue that we are trying to solve with the VAAI UNMAP primitive.

Step 6 – We finally get to the point where we can now use the SCSI UNMAP primitive. If you've been following my blog posts, you'll know that we can now reclaim this stale and stranded space using the vmkfstools command.

Caution – We expect customers to use this primitive during their maintenance window, since running it on a datastore that is in-use by a VM can adversely affect I/O for the VM. I/O can take longer to complete, resulting in lower I/O throughput and higher I/O latency.

A point I would like to emphasize is that the whole UNMAP performance is totally driven by the storage array. Even the recommendation that vmkfstools -y be issued in a maintenance window is mostly based on the effect of UNMAP commands on the array's handling of other commands.

There is no way of knowing how long an UNMAP operation will take to complete. It can be anywhere from few minutes to couple of hours depending on the size of the datastore, the amount of content that needs to be reclaimed and how well the storage array can handle the UNMAP operation.

To run the command, you should change directory to the root of the VMFS volume that you wish reclaim space from. The command is run as:

vmkfstools –y <% of free space to unmap>

The % value provided is then used to calculate the amount of stranded space that should be reclaimed from the VMFS volume as follows:

<amount of space to be unmapped> = (parameter passed to vmkfstools –y * free space on vmfs volume) / 100

We will see an actual example of this command being run shortly.

Step 7 – You can verify if the UNMAP primitives are being issued by using esxtop. Press ‘u’ to get into the disk device view. then press ‘f’, ‘o’ & ’p’  to select display “VAAISTATS” and “VAAILATSTATS/cmd” fields. The values under “DELETE”, “DELETE_F” & “MBDEL/s” columns are the ones of interest during a space reclaim operation:

Esxtop
In this example, we attempted a reclaim of 60% of free space. The vmkfstools -y command displays the following:

Attempting to reclaim 60% of free capacity 48.8 GB (29.3 GB) on VMFS-5 file system 'source-datastore' with max file size 64 TB.

Create file .vmfsBalloontsWt8w of size 29.3 GB to reclaim free blocks.

Done.

vmkfstools -y created a balloon file of 29.3GB which is 60% of the free capacity (48.8GB). This temporary “balloon file” is equal to the size of the space to be unmapped/reclaimed.

There is a note of caution here – if you specify a % value in the high 90s or 100, the temporary "balloon" file which is created during the reclaim operation may fill up the VMFS volume. Any growth of current VMDK files or the creation of new files, such as snapshots, may fail due to unavailable space. Care should be taken when calculating the amount of free space to reclaim.

If we now look at esxtop while the reclaim is going on:

Esxtop2

From above output we see some UNMAP commands have been issued. By viewing the DELETE and the MBDEL/s columns, we can see the rate at which the commands are being processed. If you see values incrementing in the DELETE_F column, then that means some UNMAP commands may have failed.

Step 8 – Finally, if we return to our storage array and query the status of the Thin Provisioned LUN, we should now see a difference in the occupied space:

lun show -v /vol/vol2/thin-lun
/vol/vol2/thin-lun    50g (53687091200)   (r/w, online, mapped)
        Serial#: W-TrJ4japbzR
        Share: none
        Space Reservation: disabled
        Multiprotocol Type: vmware
        Maps: unmap=51 issi=51
        Occupied Size:   76.3m (79966208)    
        Creation Time: Tue Apr 24 15:16:52 BST 2012
        Cluster Shared Volume Information: 0x0

And there we have it. A real life example of the SCSI UNMAP primitive reclaiming dead space from a Thin Provisioned LUN backing a VMFS-5 datastore.

You should also note that in 5.0 U1, even if the advanced option to issue SCSI UNMAP when deleting a VMDK or doing a Storage vMotion is enabled (/VMFS3/EnableBlockDelete), it will no longer do so. The only way to reclaim stranded space in 5.0U1 is via vmkfstools.

Once again, thanks to Paudie for putting this together, and also to Luke Reed and our other friends at NetApp for both the equipment and assistance with getting it updated to a version of ONTAP which supports the UNMAP primitive.

Get notification of these blogs postings and more VMware Storage information by following me on Twitter: Twitter @VMwareStorage