[Updated with vSphere HA clarifications]
This was an interesting question that came my way recently. One of our storage partners wanted to ensure that a VMFS volume was completely quiesced (no activity) and was interested to know what could possibly be the cause of writes to the VMFS volume when all Virtual Machines were powered off.
There are quite a few vSphere features which could be updating a volume, and after a bit of research, I decided it might be a good idea to share the list with you.
- If you have a Distributed Virtual Switch in your virtual infrastructure, changes to the network configuration would result in updates to the .dvsdata configuration file which sits on a VMFS volume.
- If you have implemented a vSphere HA cluster, then there may be updates going to vSphere HA 5.0 heartbeat datastore and related files. First, what are these heartbeat datastores used for? Well, to have some control over the HA cluster in the event of a network failure when nodes can no longer communicate over the network, vSphere HA introduced heartbeat datastores. Through the use of these HB datastores & special files on other datastores, a master can determine which slave hosts are still alive, and also determine if there has been a network partition rather than network isolation (there will be different behaviour depending on which). Note that we don't write to the HB file; it is opened so that the "metadata HB" on the VMFS volume is updated. Other vSphere HA files, which reside in special folders on all datastores in the cluster, are also written to.
- Another possibility, of course, is that writes are coming from the VMFS metadata heartbeat updates. These are essentially pulses from an ESXi host to inform other hosts (which might be looking to update a file) that this host still has a lock on the file in question.
- An ESXi host can be deployed with a designated scratch partition or the scratch partition could be placed as a folder on a VMFS datastore if no suitable partition exists. If an ESXi scratch partition has been located on a VMFS datastore, then it may be that the scratch partition is being regularly updated with host information (e.g. tmp files, log updates, etc). This could be the source of spurious writes to the VMFS.
- Storage I/O Control could be enabled on the datastore. If this is the case, each host that uses the datastore writes metrics to special files on the datastore. These files are used to determine the datastore wide latency value across all hosts to the datastore. If this exceeds the defined latency value (default 30ms), this is an indicator to SIOC to start throttling. The last update I've seen on this suggests that these files are updated by all hosts every 4 seconds.
- Finally, the VMFS volume could be part of a Storage DRS datastore cluster. If load balancing based on I/O metrics are enabled, then Storage DRS may be using Storage I/O Control to measure the datastore latency values as mentioned in number 5.
So as you can see, simply shutting down VMs on a datastore is not enough to ensure that they are quiesced. A number of other vSphere features could be writing to the datastore (I may have even missed some in this list).
If you need a datastore to be completely quiesced for whatever reason, I'd recommend using esxtop to ensure that there is no I/O activity after you have shut down your VMs.
Get notification of these blogs postings and more VMware Storage information by following me on Twitter: @VMwareStorage