Product Announcements

Virtual SAN Automatic “Add Disk to Storage Mode” Fails

The deployment and configuration of Virtual SAN, with its two-click configuration capability, is indeed “radically simple.” Upon enabling Virtual SAN and leaving the default disk management setting (“Add disks to storage”) set on automatic, Virtual SAN will detect the solid state and magnetic disks that physically exist within the Virtual SAN cluster hosts. Virtual SAN will then create two partitions on each disk, place the disks in their relative disk groups, and pool those disk groups into a single datastore.

Recently, I was working with a customer who ran into a very interesting scenario in which, despite having the disk management setting set on automatic, Virtual SAN would not form disks groups around the disks present in the hosts. Upon examination, we discovered NTFS partitions already in existence on the disks. Evidently, the customer’s server acquisition process asks that the server vendor pre-image all disks with NTFS prior to shipping. When Virtual SAN detects an existing partition, it does not automatically erase these partitions and replace it with its own. Instead, you will notice the Virtual SAN cluster being enabled without disks groups. This serves to protect from accidental drive erasure. Since NTFS partitions already existed on the drives, Virtual SAN was awaiting manual intervention.

There are a few scenarios in which we may find pre-existing partitions on disks that are slated for Virtual SAN consumption (e.g. pre-imaged disks, repurposed hardware, rebuilding Virtual SAN testing environment, etc). In order for Virtual SAN to manage these disks, they will need to have their partitions removed. This removal process can be performed with most disk format utilities. It is important to remember not to format the disk after you erase the partition; otherwise, a new partition will be created.

A quick way to erase pre-existing partitions and enable Virtual SAN to manage the disks is to use the partedUtil utility that is native to the ESXi kernel environment and available from the command line. Below you will find a step-by-step example of how to use the utility to delete the unwanted partitions:

Step 1) SSH to the ESXi host and login as a privileged user

Step 2) Check to see if partitions already exist

Use the “ls” command to list the NAA identifier for each disk device present in the system. NAA stands for Network Addressing Authority identifier and is unique to each disk device.

If there are existing partitions you will see similar output to the following with disk IDs followed by “:1”, “:2”, etc.

Allow me to explain further, let’s use disk id “naa.600605b00896b6e01acf437ab22efc8e” for our example. You may notice that this disk ID actually appears 3 times in the output of our command. The first listing is simply the disk ID. The subsequent listings indicate the individual partitions existing on the disk.

Note: You can use the partedUtil.sh script (not to be confused with the partedUtil command located in /bin) to list the partitions of each disk and translate the partition number to a more familiar format. In the example below you will see for disk ID “naa.600605b00896b6e01acf437fb2784220” there are two partitions listed. One partition is labeled “vsan” and the other is labeled “vmfs”.

Step 3) Validate Disk Device IDs
If you are booting ESXi from another method than local disks, and no external storage is connected to the system, it is probable that the disks identified in step 2 are all local disks. In order to validate that this is true, the following command can be used:

This command will output approximately 28 lines of information regarding each storage device in the system. For the sake of brevity I will forego listing all 28 lines in this article. This output can be used to identify the location of the disk IDs (local disk, shared storage, etc.) along with whether or not the storage controller reported the disks as local or non-local disks. By validating the disk device IDs before we run partedUtil and remove the partitions, we can save ourselves from deleting partitions on devices that house data we would like to keep (e.g. Boot Disk). Here are two ways in which you can filter the output of this command to provide only the information that we are seeking at the moment:

Or if you prefer CSV:

By using these commands you can determine the physical location of the disks, whether or not the controller has conveyed if they are local or non-local accurately, as well as which disks are SSD and Boot devices. This information will allow us to decide which disk IDs to use in the next step as we look to remove the existing disk partitions.

Note: If any disks were presented as non-local that are actually local to the host, this will also prevent Virtual SAN from automatically managing them. You can use the following command to reflag these as local and allow Virtual SAN to begin to manage them.

Step 4) Remove disk partitions

The partedUtil command provides a quick and easy method for managing disk partitions. In this case we will use partedUtil to remove the existing partitions so that Virtual SAN can manage the disks directly.

Here is the syntax for the partedUtil command:

Note: The “partedUtil” command must be typed in the proper case, with the upper case letter “U” in order to be recognized. You can also type “parted” and then press if you would like to autocomplete the word.

Here is the actual command we would use in our example:

Locked Partitions
Some disk partitions may be in a locked (read-only) state and unable to be deleted. This can be caused by a number of scenarios (rebuilding a VSAN environment, ESXi core dump location, etc). Here is one method to remove these partitions:

1. Stop the related process
In the case of VSAN, disable VSAN and stop the vsantrace daemon

2. Unmount datastore

3. Delete partition

See VMware KB 2004605: Unmounting a LUN or detaching a datastore/storage device from multiple VMware ESXi 5.x hosts for more information.

See also VMware KB 1020668: Changing the location of ESX core dumps (VMware KB: 1020668)

After successfully removing the partitions, enable Virtual SAN and it will automatically begin managing the disks.

Next: Stay tuned for Part II of this article where we will demonstrate how to automate this process via both esxcli and powershell scripts.

————————————————————————————————————————————-
References

Using the partedUtil command line utility on ESXi and ESX (VMware KB: 1036609)

Identifying disks when working with VMware ESX/ESXi (VMware KB: 1014953)

Unmounting a LUN or detaching a datastore/storage device from multiple VMware ESXi 5.x hosts (VMware KB: 2004605)

Enabling the SSD option on SSD based disks/LUNs that are not detected as SSD by default (VMware KB: 2013188)

Changing the location of ESX core dumps (VMware KB: 1020668)