Ensure your data when a host is in maintenance mode
What happens when you place a host from a vSAN cluster in maintenance mode? Let’s dive in and give some insights on this matter, especially as there have been a few changes in the way vSAN handles this in recent versions.
First and foremost, I want to emphasize the fact that placing a host in maintenance mode is the equivalent of powering off a host from a capacity and compute standpoint. It means all the resources the host contributes to the cluster will be temporarily unavailable. Therefore, you must pay close attention to the type of data migration you want to apply when you’re placing a host in maintenance mode. There are specific use cases and guidelines for each type of data migration. We recommend looking at these guidelines before undertaking any maintenance mode related operations.
One of the easiest ways to understand the data migration process when placing a host in maintenance mode is to look at the vSAN interactive infographic and explore the maintenance mode section. This section is focused on showcasing the processes related to data migration services when a host is placed in maintenance mode. There are three different drop-down menu options in this part of the infographic: Type of data migration, type of resiliency, and a number of hosts to be placed in maintenance mode. Combining these options allows you to obtain a broader view of the mechanisms behind different types of data migration.
Let’s take a look at a few unique scenarios with regard to maintenance mode. First, let’s see what will happen if you have 6 hosts within your cluster, you have applied a policy containing FTT=2/RAID-6, and you want to place a host in maintenance mode using the ‘Full data migration’ option. This means all the data residing on the host will be evacuated to another host within the cluster. With this type of data configuration, the system will prompt you to add an additional host to the cluster in order to preserve the policy compliance. The policy will always require there must be two failures that can be tolerated, even during cluster reduction due to maintenance mode removing a host. Note that a minimum of six hosts is required to support a storage policy with RAID-6(FTT=2) erasure coding. You can find more information about this configuration here.
In rare cases, you might need to place more than one host from the cluster in maintenance mode for a short period of time. Although we do not recommend placing more than one host in maintenance at a time, the cluster can still support this configuration. For example, a 6-node cluster with storage policy FTT=1 /RAID-1 and Ensure accessibility type of data migration can support two hosts in maintenance mode. If both components of a certain object are residing on the hosts in maintenance mode, the Ensure accessibility data migration service will move one of the components on another available host.
The same output might be observed if you place a host in maintenance mode with Ensure accessibility migration type, and there is a data component that has no replica (RAID-0 policy). If the only available data component resides on the host placed in maintenance mode, Ensure accessibility will move the VM to another host, as well as its data. This way the data will still be accessible until the host exits maintenance mode. Avoid using the "No data migration option" for maintenance mode if a storage policy with no redundancy (RAID-0) is applied to your data objects. This helps ensure data with no redundancy remains accessible when a host is put into maintenance mode.
vSAN Data Pre-check Page helps you test your configuration
vSAN 6.7. Update 3 introduces a Data pre-check page which gives you the opportunity to verify the outcome from placing a host in maintenance mode. It is a detailed view that provides more visibility on the maintenance mode operations. Now you can test and explore the estimated impact on the overall capacity within the cluster while there is a host in maintenance mode. This feature enhancement helps remove the risk of making objects temporarily inaccessible due to maintenance mode operations. Select the host, select the type of data migration you want to apply, and vSAN will give you various insights on a per-host basis:
- Capacity overview
- Data availability
- Estimated success status of the given operation
- The resources needed if the maintenance mode operation is deemed to not succeed
Every time you need to place a host in maintenance mode, vSAN will suggest a fast pre-check operation. After you launch the test, vSAN will perform a simulation of the data movement in order to determine the success status of the data movement process. Here you can find a guided click-through demo that reveals the benefits of using the Data placement pre-check page.
Summary
If you want to reduce the guesswork when placing a host in maintenance mode, we recommend you take advantage of the detailed reports on the Data Pre-check page. It allows you to compare the impact on the compute and storage capacity and analyze what will be the most suitable option for your vSAN configuration. Also keep in mind the following good practices that we always remind our customers: use a minimum of FTT=1 for workloads that must remain online during maintenance activities and avoid using “No data migration” unless it is necessary. Being well informed and prepared while managing your HCI environment helps avoid mistakes and saves you time by eliminating unnecessary error fixing operations.