Over the past couple of days, I’ve been looking at the behavior of Virtual SAN (VSAN) from the context of placing a host into maintenance mode. With VSAN enabled on the cluster, the administrator is given a number of options to choose from:
- Ensure accessibility
- Full data migration
- No data migration
The thing with VSAN is that although a virtual machine’s compute may not be on the host that is being placed into maintenance mode, there is still a strong possibility that part of the virtual machines storage object may be on the local disks of that host, especially if you are using a NumberOfFailuresToTolerate policy setting (which you should be). So if ‘Ensure accessibility’ or ‘Full data migration’ is chosen, components of the virtual machines storage objects may have to be migrated from the host entering maintenance mode.
You might now ask what the difference between ‘Ensure accessibility’ and ‘Full data migration’? Well, ensure accessibility just means that enough of the virtual machines storage objects will remain available in the cluster so the virtual machine can continue to run, although the virtual machine may no longer be fully compliant from a VM storage policy perspective, i.e. it may not have access to all its replicas. ‘Full data migration’ means that all of the components on the local storage of this host will be migrated elsewhere in the cluster so that when the host enter maintenance mode, all VMs will still have their full complement of storage components and will still be compliant from a VM storage policy perspective.
The trade off is that ‘Ensure accessibility’ should be quicker than ‘Full data migration’, but it will mean that some of your virtual machines may be impacted should another failure occur in the cluster whilst that host is in maintenance mode. Although with ‘Ensure accessibility’, there may also be some migration of storage components involved.
Another consideration is additional storage space. If you go for ‘Full data migration’, you need to ensure that you have enough free space on the local storage of the remaining nodes in the cluster to be able to accommodate all of the storage components of your virtual machines.
Lastly, an important question was raised on the community forums regarding maintenance mode. This was how could you track the progress of the data migration. One way is through the Ruby Virtual Console (RVC). I won’t go into the details of how to use the RVC as my colleague Rawlinson already did a good job on it here. The RVC has a number of dedicated VSAN commands, one being vsan.resync_dashboard. This command allows you to display the number of objects that are being synchronized in the cluster, and how many bytes are remaining before the sync is complete.
To utilize this command, a few other commands are necessary. First, you may want to observe which disks are on your ESXi host. To do this, the command vsan.host_info is useful. Once the disks have been identified, vsan.disk_object_info can be used to check which virtual machine objects reside on a particular disk. This will report objects like the virtual machine disk, Namespace directory and swap.
Now you are ready to place your host into maintenance mode. If you choose the option to do ‘Full data migration’, then you will want to monitor the progress with the vsan.resync_dashboard command:
Eventually, when the host has entered maintenance mode, you can once again use the vsan.disk_object_info to examine the contents of the disks on the host. Again, if ‘Full data migration’ has been chosen as the Maintenance Mode option, there should be no components belonging to virtual machines left on any of those disks.
Hopefully that has given you some idea on how to monitor the progress of a maintenance mode operation, and indeed a rebuild operation in the event of a failure in the VSAN cluster.
Get notification of these blogs postings and more VMware Storage information by following me on Twitter: @VMwareStorage