Historically, SD cards or USB devices have been chosen to free up device bays and lower the cost of installing ESXi hosts. Such devices, however, have lower endurance and exhibit reliability and issues over time. SD cards and USB drives may also exhibit performance issues and may not tolerate high-frequency read-write operations. We are now witnessing boot-related problems more frequently with ESXi 7.x with the hosts using SD cards or USB drives as boot media. This blog post will outline such issues in detail and provide the technical guidance to mitigate the same.
Before getting into the details, it is crucial to understand the new system layout. Before vSphere 7, partition management was limited in that partition sizes were fixed and the partition numbers were static. There were constraints on using multiple solutions with the 6.x partition sizes, for example, if you started to combine NSX-T, vSAN, Tanzu, vGPU etc. This restricted the support for installing large modules, debugging functionality, and possible third-party components.
Looking forward, the need for ESXi hosts to support other VMware or 3rd party solutions is ever-increasing. Therefore, the need for a more reliable, flexible, and high-performing storage device for ESXi 7.x system storage is a necessity.
With the new partition schema in vSphere 7.x, only the system boot partition is fixed at 100 MB. The rest of the partitions are dynamic, meaning partition size will be determined based on the boot media size.
Another significant change in the context of SD cards and USB devices is the ESX-OSData partition. All the non-boot partitions such as core dump, locker, and scratch partitions are now consolidated under the new partition called ESX-OSData (VMFS-L) partition.
ESX-OSData partition must be created on a high endurance persistent storage device as there is an increase in IO requests sent to the ESX-OSData partition. The increased IO request is a result of multiple factors that have been introduced with ESXi 7.x such as:
- Increased number of probe requests sent to check the device state, making sure they continue to service IO requests.
- Scheduled scripts to backup system state, timestamp slightly contribute to the increased IO requests.
- Also, more features and solutions store their configuration state on ESX-OSData, thus requiring it to be installed on a high-endurance, locally attached persistent storage device.
Potential issues with ESXi 7 and SD card and USB drives
Potential VMFS-L locker partition corruption
Low endurance devices such as SD cards or USB drives wear out quickly due to the frequent read operations. The most common reasons for frequent read operations are VMTools files accessed by virtual machines. One way to mitigate this is to offload the VMTools read IO operations to RAM Disk, significantly reducing the IO operations sent to SD cards or USB drives.
The current workaround is to move VMTools to RAMDisk by enabling the ToolsRAMDisk Option manually. The issue is resolved in vSphere 7 Update 2c. Please refer to KB Article 83376 for more details.
SD cards or USB devices usually have a small queue depth resulting in outstanding IO requests at the storage stack, ultimately getting timed out. The issue is resolved in vSphere 7 Update 2c. Please refer to KB Article 83963 for more details.
Remediation plan on the usage of SD cards or USB drives as a boot media
The use of SD cards or USB devices come with some limitations:
- The use of standalone SD cards or USB Drives (No additional device available for ESX-OSData partition) as boot media to store the ESX-OSData partition is being deprecated in vSphere 7 Update 3 and will not be supported in future major releases.
- In the near future, the only supported configuration involving the usage of SD card or USB drives as boot media is a minimum of 8 GB SD card or USB drive + Locally attached persistent storage device for ESX-OSData partition. Recommended list of locally attached storage devices are listed below.
- In any case, if an SD or USB device is used for boot media (only SD/USB or SD/USB + local HDD/SSD disk), follow the below instructions to reduce the amount of IO sent to the SD or USB boot media:
- Enable the ToolsRamDisk option to offload VMTools install/upgrade IO requests on the RAM disk.
- Ensure the /scratch partition is configured on persistent storage like a local HDD/SSD or boot from the SAN device. The ESXi 7.0 installer will not create the /scratch partition on an SD card or USB drive. It tries to find persistent storage and attempts to create /scratch on the persistent storage device.
A 250 MB /tmp partition is created on a RAM disk if no persistent storage is available. Please note that the performance of ESXi hosts gets affected if /tmp partition runs out of space.
VMware does not support the /scratch partition on the SD or USB boot media. Always configure /scratch on locally attached disk (HDD or SSD). If the local disk is not available, then configure it on a SAN. Please follow the steps mentioned in KB Article 1033696.
Note: Please be wary that RAMDisk is volatile and subject to data loss, and hence storing portions of the boot partitions on RAM Disk results in a degraded mode of operation.
- It is always a good practice to configure ESXi Dump Collector to offload core dumps.
- Please ensure that customers also upgrade to vSphere 7 Update 2c or higher to overcome the missing /bootbank issue. Please refer to the release notes at https://docs.vmware.com/en/VMware-vSphere/7.0/rn/vsphere-esxi-702-release-notes.html
- The dual SD card: This is not a solution customers should rely on. Read disturbance/performance issues can be caused in the dual SD card setup as well. It is also important to note that even a single SD card degradation in a mirrored setup can also cause degradation on other SD cards.
- If your ESXi host is already upgraded to 7.x, you can add a locally attached storage device and set autoPartition=True. It will partition first locally attached storage to be used for ESX-OSData partition upon the next reboot. Please see VMware KB Article 77009
ESXi Boot Considerations
As of today, the best practice is that customers should have, and in the future, must have a locally attached persistent storage device. Please check the details below for more information.
Booting ESXi host from a locally attached storage in a vSAN environment
Booting the ESXi host from the same storage controllers shared by the vSAN disks is not recommended. Customers may consider getting an additional controller to support the boot device, Or use an M.2 flash, PCIe NVMe, or SATA SSD device.
Boot from SAN
Boot from SAN is fully supported and in this case, customers can use a minimum of 32 GB LUN (FC/FCoE/iSCSI). All the partitions including the ESXOSData partition will be entirely stored on SAN LUN.
vSphere Auto Deploy Stateless Installations
vSphere Auto Deploy Stateless installation is supported in vSphere 7. Stateless deployments, while supported in vSphere 7, are not compatible with vSphere Lifecycle Manager.
VMware is moving away from the support of SD cards and USB drives as boot media. ESXi Boot configuration with only SD card or USB drive, without any persistent device, is deprecated with vSphere 7 Update 3. In future vSphere releases, it will be an unsupported configuration. Customers are advised to move away from SD cards or USB drives completely. If that is not currently a feasible situation, please ensure a minimum of 8GB SD cards or USB drive is present and an additional minimum of 32 GB locally attached high endurance device available for ESX-OSData Partition.
As explained earlier, customers should take all the actions that reduce IO requests sent to the SD card or a USB device, such as configuring /scratch partition on a persistent device, configuring network core dump, and enabling the toolsRamDisk option. In addition, customers should upgrade to the vSphere 7 Update 2c release or higher that contains fixes to USB and storage stack issues.
- ESXi System Storage Changes
- ESXi System Storage While Upgrading
- ESXi System Storage FAQs
- ESXi 7 Storage Requirements
- ESXi 7 System Storage warnings – VMware KB Article 85615
- VMFS-L locker partition corruption
- Bootbank cannot be found at path ‘/bootbank’ errors being seen after upgrading to ESXi 7.0 U2
- Creating a persistent scratch location for ESXi 7.x/6.x/5.x/4.x
- Configure ESXi Dump Collector with ESXCLI