VMware vSphere includes a feature called Maintenance Mode that is useful for planned downtime activities such as firmware updates, storage device replacement, and software patches. Assuming you have VMware vSphere Distributed Resource Scheduler (DRS) enabled (fully automated), Maintenance Mode will migrate virtual machines from the host entering maintenance mode to other hosts in the cluster. VMware vSAN uses locally attached storage devices. Therefore, when a host that is part of a vSAN cluster goes into maintenance mode, the local storage devices in that host cannot contribute to vSAN raw capacity until the host exits maintenance mode. Here is an example: Consider a vSAN cluster with 10 hosts. Each host has 10TB of raw capacity used for vSAN. That is 100TB total raw capacity. When a host is placed into maintenance mode, the local storage in that host – 10TB – is not available for use by vSAN. The raw capacity of the cluster is reduced to 90TB until the host exits maintenance mode.
You might be wondering what happens to the data stored on the local drives of that host. The short answer is “it depends.” That is what we spend a few moments on in this article. More importantly, you will learn why special consideration is necessary for the vSAN maintenance mode option named “No Data Evacuation.”
Let’s start with a brief explanation of vSAN maintenance mode options. This diagram shows the options in the vSphere Web Client.
Evacuate all data to other hosts
This option moves all of the vSAN components from the host entering maintenance mode to other hosts in the vSAN cluster. This option is commonly used when a host will be offline for an extended period of time or permanently decommissioned.
Note: If you need more information on the basics of vSAN objects and components, read parts one and two in this blog article series.
Ensure data accessibility from other hosts
vSAN will verify whether an object remains accessible even though one or more components will be absent due to the host entering maintenance mode. If the object will remain accessible, vSAN will not migrate the component(s). If the object would become inaccessible, vSAN will migrate the necessary number of components to other hosts ensuring that the object remains accessible. This option is the default and it is commonly used when the host will be offline for just a short amount of time, e.g., a host reboot. It minimizes the amount of data that is migrated while ensuring all objects remain accessible. However, the level of failure tolerance will likely be reduced for some objects until the host exits maintenance mode.
No data evacuation
Data is not migrated from the host as it enters maintenance mode. This option can also be used when the host will be offline for a short period of time. All objects will remain accessible as long as they have a storage policy assigned where the Primary Level of Failures to Tolerate is set to one or higher.
This brings us to our words of caution: If an object has a storage policy where the Primary Level of Failures to Tolerate (PFTT) is set to zero, using the No Data Evacuation maintenance mode option might cause the object to become inaccessible. The diagrams below illustrate why. Object A has PFTT = 1 (mirroring). Object B has PFTT = 0.
If we put Host 1 into maintenance mode using the No Data Evacuation option, Object A remains accessible. Object B becomes inaccessible as the only copy of that data is on Host 1.
You might be wondering if data loss with Object B occurred when the host entered maintenance mode. The answer is no – the data is still on the host, but it will not be accessible until the host exits maintenance mode.
Recommendation: Pay close attention to the “what-if” information displayed in the Maintenance Mode UI. It will tell you if objects will become inaccessible as a result of putting the host into maintenance mode. An example is shown below.
To avoid objects becoming inaccessible, select the Ensure Data Accessibility From Other Hosts option. vSAN will migrate the components required to keep the objects accessible to other hosts. You can also assign a storage policy to the objects where PFTT is set to one or higher. Just be sure all objects are in compliance with their storage policies before continuing with the No Data Evacuation option.