Technical

ESXi 7 Boot Media Considerations and VMware Technical Guidance

Historically, SD cards or USB devices have been chosen to free up device bays and lower the cost of installing ESXi hosts. Such devices, however, have lower endurance and exhibit reliability and issues over time. SD cards and USB drives may also exhibit performance issues and may not tolerate high-frequency read-write operations. We are now witnessing boot-related problems more frequently with ESXi 7.x with the hosts using SD cards or USB drives as boot media. This blog post will outline such issues in detail and provide the technical guidance to mitigate the same.

Before getting into the details, it is crucial to understand the new system layout. Before vSphere 7, partition management was limited in that partition sizes were fixed and the partition numbers were static. There were constraints on using multiple solutions with the 6.x partition sizes, for example, if you started to combine NSX-T, vSAN, Tanzu, vGPU etc. This restricted the support for installing large modules, debugging functionality, and possible third-party components.

Looking forward, the need for ESXi hosts to support other VMware or 3rd party solutions is ever-increasing. Therefore, the need for a more reliable, flexible, and high-performing storage device for ESXi 7.x system storage is a necessity.

With the new partition schema in vSphere 7.x, only the system boot partition is fixed at 100 MB. The rest of the partitions are dynamic, meaning partition size will be determined based on the boot media size.

Another significant change in the context of SD cards and USB devices is the ESX-OSData partition. All the non-boot partitions such as core dump, locker, and scratch partitions are now consolidated under the new partition called ESX-OSData (VMFS-L) partition.

ESX-OSData partition must be created on a high endurance persistent storage device as there is an increase in IO requests sent to the ESX-OSData partition. The increased IO request is a result of multiple factors that have been introduced with ESXi 7.x such as:

  • Increased number of probe requests sent to check the device state, making sure they continue to service IO requests.
  • Scheduled scripts to backup system state, timestamp slightly contribute to the increased IO requests.
  • Also, more features and solutions store their configuration state on ESX-OSData, thus requiring it to be installed on a high-endurance, locally attached persistent storage device.

Potential issues with ESXi 7 and SD card and USB drives

  1. Potential VMFS-L locker partition corruption

Low endurance devices such as SD cards or USB drives wear out quickly due to the frequent read operations. The most common reasons for frequent read operations are VMTools files accessed by virtual machines. One way to mitigate this is to offload the VMTools read IO operations to RAM Disk, significantly reducing the IO operations sent to SD cards or USB drives.

The current workaround is to move VMTools to RAMDisk by enabling the ToolsRAMDisk Option manually. The issue is resolved in vSphere 7 Update 2c. Please refer to KB Article 83376 for more details.

  1. ‘/bootbank’ missing

SD cards or USB devices usually have a small queue depth resulting in outstanding IO requests at the storage stack, ultimately getting timed out. The issue is resolved in vSphere 7 Update 2c. Please refer to  KB Article 83963 for more details.

Remediation plan on the usage of SD cards or USB drives as a boot media

The use of SD cards or USB devices come with some limitations:

  1. The use of standalone SD cards or USB Drives (No additional device available for ESX-OSData partition) as boot media to store the ESX-OSData partition is being deprecated in vSphere 7 Update 3 and will not be supported in future major releases.
  2. In the near future, the only supported configuration involving the usage of SD card or USB drives as boot media is a minimum of 8 GB SD card or USB drive + Locally attached persistent storage device for ESX-OSData partition. Recommended list of locally attached storage devices are listed below.
  3. In any case, if an SD or USB device is used for boot media (only SD/USB or SD/USB + local HDD/SSD disk), follow the below instructions to reduce the amount of IO sent to the SD or USB boot media:

A 250 MB  /tmp partition is created on a RAM disk if no persistent storage is available. Please note that the performance of ESXi hosts gets affected if /tmp partition runs out of space.

VMware does not support the /scratch partition on the SD or USB boot media. Always configure /scratch on locally attached disk (HDD or SSD). If the local disk is not available, then configure it on a SAN. Please follow the steps mentioned in KB Article 1033696.

Note: Please be wary that RAMDisk is volatile and subject to data loss, and hence storing portions of the boot partitions on RAM Disk results in a degraded mode of operation.

  1. Please ensure that customers also upgrade to vSphere 7 Update 2c or higher to overcome the missing /bootbank issue. Please refer to the release notes at https://docs.vmware.com/en/VMware-vSphere/7.0/rn/vsphere-esxi-702-release-notes.html
  2. The dual SD card: This is not a solution customers should rely on. Read disturbance/performance issues can be caused in the dual SD card setup as well. It is also important to note that even a single SD card degradation in a mirrored setup can also cause degradation on other SD cards.
  3. If your ESXi host is already upgraded to 7.x, you can add a locally attached storage device and set autoPartition=True. It will partition first locally attached storage to be used for ESX-OSData partition upon the next reboot. Please see VMware KB Article 77009

ESXi Boot Considerations

As of today, the best practice is that customers should have, and in the future, must have a locally attached persistent storage device. Please check the details below for more information.

Booting ESXi host from a locally attached storage in a vSAN environment

Booting the ESXi host from the same storage controllers shared by the vSAN disks is not recommended. Customers may consider getting an additional controller to support the boot device, Or use an M.2 flash, PCIe NVMe, or SATA SSD device.

Conclusion

VMware is moving away from the support of SD cards and USB drives as boot media. ESXi Boot configuration with only SD card or USB drive, without any persistent device, is deprecated with vSphere 7 Update 3. In future vSphere releases, it will be an unsupported configuration. Customers are advised to move away from SD cards or USB drives completely. If that is not currently a feasible situation, please ensure a minimum of 8GB SD cards or USB drive is present and an additional minimum of 32 GB locally attached high endurance device available for ESX-OSData Partition.

As explained earlier, customers should take all the actions that reduce IO requests sent to the SD card or a USB device, such as configuring /scratch partition on a persistent device, configuring network core dump, and enabling the toolsRamDisk option. In addition, customers should upgrade to the vSphere 7 Update 2c release or higher that contains fixes to USB and storage stack issues.

Important Links

 

Comments

5 comments have been added so far

  1. „Low endurance devices such as SD cards or USB drives wear out quickly due to the frequent read operations.“
    Really?! Reading wears out? Never heard that before.

  2. Thank you for all the details. Can you please elaborate on why booting ESXi from the same controller as vSAN disks isn’t recommended? For instance, has it led to many support cases where it was determined to be a problem? It *is* supported in v7 per https://kb.vmware.com/s/article/2129050 as long as all drives are either in RAID mode or HBA mode but not mixed-mode. Will this one day be depreciated as well?

  3. Hi,
    Does the additonal device for the ESX-OSData partition have to be locally attached as you state multiple times or can it also be a iSCSI LUN as the summary table states?

  4. We upgraded about a few weeks ago, Dell and Cisco servers on VMware HCL. As you know they come with ESXi loaded already on the internal USB drives, very typical. We have nothing but trouble since with bootbank not available. Issue is the server becomes un-responsive and you have to shut down the VM’s via the OS then power cycle the host. When the box is on the HCL you can’t just change something as significant as this with no planning? Where is your testing internal, why isn’t the HCL reflective of this change? We had known this we would not have upgraded UNTIL we addressed this topic.

  5. We purchased several HP ESXi hosts on February as upgrade from ver 5.5. On 5.5 ESXi it was installed on internal SD card and worked fine for 4 years without any issue, but servers was old so decision to purchese new hardware was taken. After deployment on internal SD RAID cards , after over a week one host in cluster became unresponsive (red icon on it) and was not able to manage VMs. Only way to move VMs to other host (vMotion) was to shut down them (via RDP because was not managable via vCS) and copy them to other host. Later only downgrade to 7.0 was able. What a poor QA….
    And just now what – announcment that newer updated will depricate SD card ? So what – throw away new serwers and paid hardware…..

Leave a Reply

Your email address will not be published. Required fields are marked *