We have several Virtual SAN customers who are interested in using a flash device not only for ESXi boot but also for persisting logs, traces and core dumps. Virtual SAN provides a number of flash options that can be used for boot. viz. USB, SD card, SATADOM and SSD. Each of these devices has its own pros and cons.
In this blog, I will go over the various factors you need to take into account before deciding the right flash boot device for your Virtual SAN, the pros and cons of each device and alternate tools/utilities you need to use to store logs/dumps while using some of these devices for boot.
So where do I start?
First off, ask yourself the following questions that will help you make the right choice of boot device:
- What size of device would fit my needs?
- What endurance specs should I consider to ensure that my flash device does not wear out quickly?
- Can I (or should I) persist logs, traces & core dumps on this flash device?
- How much will this device cost me?
- Based on the above, should I use a USB/SD card, SATADOM or SSD for boot?
What flash device options are available for Virtual SAN?
The following flash devices are certified for use as boot with Virtual SAN:
An external flash drive that can be used with any x86 server that has a USB port. Typically comes in sizes varying from 4GB, 8GB, 16GB, 32GB, 64GB etc and has low endurance so is not a good fit to write large amount of logs and traces.
2. SD Card
A small flash storage device (in a card form factor) very similar to USB in terms of size and endurance.
A small SATA3 (6Gb/s) flash memory module designed to be conveniently inserted into a serverboard SATA connector to provide high performance solid state storage capacity. The endurance of SATADOMs are usually much higher than that of USB/SD cards. From a capacity and endurance standpoint, they typically meet the requirements to store logs, traces and dumps.
A flash storage device containing nonvolatile flash, supporting both SATA and SAS interfaces used in place of a hard disk because of its much greater speed. From a capacity and endurance standpoint, it meets the requirements to store logs, traces and dumps but you end up losing a drive slot if you use an SSD for boot. (In which case it might be better to just use a SATA HDD).
So what’s this big deal about persisting logs, traces, dumps etc?
While booting an ESXi host from a flash device, there are three main considerations to take into account:
- How to persist vmkernel logs?
- How to persist Virtual SAN trace files?
- How to capture core dumps on PSOD?
It is important to persist vmkernel logs, Virtual SAN trace files and core dumps as these tend to be very valuable for support and engineering teams to troubleshoot issues.
1. Persisting vmkernel logs:
Before getting into how the above are persisted, it is important to note that vmkernel logs are actually saved by default in a RAMDisk for USB/SD card configurations but are written directly to the boot device for SATADOM and SSD configurations.
What is a RAMdisk?
A RAMdisk is basically a partition or storage space that resides in memory, so you should note that the contents of a RAMdisk are not persisted during a reboot. Therefore, you end up losing the logs on reboot if they are not persisted.
- SD/USB cards do not have enough space or endurance to persist logs so it is recommended that the logs be redirected to a remote syslog server if you are using SD/USB for boot. Keep in mind while using SD/USB as boot, these logs are NOT persisted anywhere on a shutdown / core dump if you don’t use a remote syslog server
- SATADOM devices with the right endurance and capacity can be used to persist vmkernel logs. The recommended specs are
- Size: >=16GB
- Generic recommendation: Endurance of 512-1024 TBW sequential for Virtual SAN 6.1 or earlier
- Exception for Virtual SAN 6.2 or later: Endurance of 384 TBW sequential for Virtual SAN 6.2 (We have optimized and reduced the amount of write workload on Virtual SAN 6.2 so the lower endurance may suffice). However, devices in this category have to be qualified on a case by case basis. Please email email@example.com with detailed specs if you want to get a SATADOM device in this endurance category certified.
- The downside of using a SATADOM is that it tends to be expensive and drives up the overall cost of the solution.
2. Persisting Virtual SAN trace files:
Virtual SAN traces help VMware support and engineering to understand what is going on internally with Virtual SAN. It should be noted that these traces are *not* part of syslog. So if you setup a syslog server to capture VMkernel logs, you will not capture Virtual SAN traces. Virtual SAN trace files are not persisted with syslog because the bandwidth requirements are too high. (Although with Virtual SAN 6.2, we now persist just the “most important” traces along with syslog).
- Virtual SAN traces require ~500MB of disk space.
- The traces are stored in RAMdisk and are persisted to a partition on the SD card ONLY on system shutdown or panic and this is done automatically by ESX. You cannot write all Virtual AN traces directly to an SD card today.
- You can persist traces directly to a SATADOM or SSD device provided you follow the above Virtual SAN release-specific size and endurance recommendations mentioned above.
3. Capturing Core dumps on PSOD:
If you run into a core dump on PSOD, it is important to capture it to so support and engineering can perform detailed troubleshooting. There is also an important consideration as it relates to the size of the memory on the ESXi host. See below:
- We support a minimum SD card size of 4GB for Virtual SAN boot. Of this 4GB, 2.2GB is set aside for the core dump. Core dumps can be successfully collected for hosts with memory size <=512GB. If memory on the ESX host >512GB, core dumps much larger in size are created on a PSOD and the 2.2GB partition is not be enough.
- For such configurations, core dumps need to be redirected to a remote server using network dump collector utility. Follow standard ESXi best practices to redirect core dumps to remote server.
- There is no such restriction if you use SATADOM or SSD for boot as long as you use one which adheres to the capacity and endurance specs mentioned above.
I’ve summarized below the key factors in terms of deciding which flash device to use for Virtual SAN boot.
Comparison of Boot Devices for Virtual SAN
- USB/SD Cards are a viable and cheap option for boot. However, they have very low endurance and are not a good fit to store logs and traces. Therefore, we recommend you use a syslog server to save logs and a network dump collector to save core dumps while using USB/SD for boot.
- SATADOMs are becoming popular in the market and are a good fit for boot and typically have the right endurance specs required for storing logs, traces and dumps. The only downside is that they tend to be more expensive and drive up the overall cost of the solution. That said, it’s a great option if you want to boot and store all traces, logs and dumps all in one place within your ESXi host.
- A cheap SATA SSD, typically 100GB or so in size may be a viable option for boot and it is comparable to SATADOM in terms of endurance specs for storing traces, logs and dumps. However, you lose a drive slot (which can otherwise be used for caching/capacity for your data store) in which case you may be better off using a cheaper magnetic drive for boot.
Keep watching this space for technical, product and strategy updates on Virtual SAN!