Product Announcements

Virtual SAN Automatic “Add Disk to Storage Mode” Fails (Part II)

In part 1 of this article, we looked at an interesting scenario in which, despite having the Virtual SAN disk management setting set on automatic, Virtual SAN would not form disk groups around the disks present in the hosts. Upon closer examination, we discovered that the server vendor pre-imaged the drives with NTFS prior to shipping. When Virtual SAN detects an existing partition, it does not automatically erase the partitions and replace it with its own. This serves to protect from accidental drive erasure. Since NTFS partitions already existed on the drives, Virtual SAN was awaiting manual intervention. In the previous article, we displayed the manual steps to remove the existing partitions and allow Virtual SAN to build the disk groups. In this article, we will look at how to expedite the process through scripting.

Warning: Removing disk partitions will render data irretrievable. This script is intended for education purposes only. Please do not use directly in a production environment.

As promised in part 1 of this article, we will demonstrate today how to create your own utility to remove unlocked/unmounted partitions from disks located within your ESXi host. The aim of the script is to provide an example workflow for removing the partitions that insists upon user validation prior to each partition removal. This example workflow can be adapted and built upon to create your own production ready utility.

The script is broken up into 3 major sections:
Section 1: Boot Device Identification
Section 2: Disk Partition Listing
Section 3: Disk Partition Removal

As you look at the code above, you will see that we are using two ESXi commands to identify our boot volume and device information.

esxcli system boot device get
esxcfg-scsidevs -f

We take the output from these commands and parse each using egrep and awk. We then store each parsed output in their own variable for reference further in our script. Finally we display this parsed output onto the screen for the user to make note of as we most likely do not want to remove partitions from our boot device.

Note: If we remove partitions from our boot device, we will no longer be able to boot from it until we reformat and reinstall ESXi onto it. It is likely we would receive an error that the boot device partition is in use and would not be able to remove it anyways. However, I believe it is better to be safe than sorry and so take the cautious route. You may note that this boot device is a USB boot device, represented by a mpx designation. Later in our script we will be looking for devices with the naa designation. See here for more information on vSphere Storage Device Naming.

The resulting output should look similar to the following:

In our next step, we will identify which disks have existing partitions. In our previous article we displayed the manual steps to identify this using the “ls” command. The ls command will list all of the devices in the directory. If there are existing partitions you will see similar output to the following with disk IDs followed by “:1”, “:2”, etc. In our illustration below we see that disk naa.600508b1001037383941424344450800 has 1 partition.

To automate this step, we can leverage the existence of the colon “:” in the directory listing. Only devices with partitions will be listed with a colon in their device name. This provides an easy mechanism for egrep and awk to filter off of. If the directory listing has a colon, than egrep will catch the entire listing and send it to awk to parse out the NAA device ID. This gives us a listing of all disk devices within the host that have existing partitions.

The resulting output should look similar to the following:

Our third section is where the disk partitions are actually removed. For this section we leverage my personal favorite conditional statement and that is the “for loop”. The for loop is a fundamental programming construct that takes a listing of items and runs a command or series of commands against each item. In this case we take a listing of disk device IDs, and then for each disk ID, we run the partedUtil command to delete the partition.

The syntax for the for loop is:

The syntax for the partedUtil command to remove the partition is (Note: There the colon separating the device ID from the partition number must be removed for the partedUtil command to run):

Here is our for loop successfully combined with partedUtil:

Note: Currently the script section above is set to run in simulation mode by printing the partedUtil command to the screen rather than actually running the partedUtil command on the system. No changes will be made to any environment this script runs on until the printf command is commented and the partedUtil command is uncommented.

In our example script, we are including only the partedUtil command in our for loop. This is for the sake of simplicity and readability of this article. Additional commands can easily be included in order to automate even more steps. For instance we could:

  1. Set disk claim rule to tag disk as local:esxcli storage nmp satp rule add –satp VMW_SATP_LOCAL –device <device id> –option “enable_local”
  2. Set SSD claim rule to tag disk as SSD and as local:esxcli storage nmp satp rule add -satp VMW_SATP_LOCAL -device <device id> -option “enable_local enable_ssd”
  3. Leverage an additional for loop to run against multiple systems

Here is the example script in its entirety. Please feel free to leverage any or all of it as you have need. Happy scripting everyone!

_______________________________________________________________________________________________
References

Using the partedUtil command line utility on ESXi and ESX (VMware KB: 1036609)

Identifying disks when working with VMware ESX/ESXi (VMware KB: 1014953)

Enabling the SSD option on SSD based disks/LUNs that are not detected as SSD by default (VMware KB: 2013188)

vSphere 5.5 Documentation: Understanding Storage Device Naming