PCIe Switch and SAS Expander support for VMware vSAN ReadyNodes

I’ve received several questions from customers about using PCIe switches for VMware vSAN ReadyNodes™. PCIe switches are supported only on vSAN ready nodes and are not supported on build your own designs at this time.
I would like to provide some clarity as well as also bring up the close compatriot of the internal PCIe switch, the SAS expander. What do I mean when I say PCIe switch? I’m referring to adapters such as Lenovo’s 1610-8 that connects to a PCIe riser and allow an oversubscription of the number of PCIe lanes to be presented. I am not referring to a top of rack switch that allows for the server to server PCIe sharing of devices (Sometimes seen as a backplane or crossbar device on high-end disk arrays).

Why would you want to use an internal PCIe Switch?

Current generation Intel processors are limited to 48 PCIe lanes per CPU. Given some of these lanes need to be used for non-storage uses (Internal LOM connection, Network Adapters, GPUs) this often leaves at most 40 to 44 PCIe lanes for storage devices. The most common U.2 form factor NVMe drives connect to a 4x PCIe backplane connector to talk to the CPU. This presents a problem on dense NVMe configurations with a large number of drives where you potentially need more PCIe lanes than physically exist. Even with 2 Intel CPU sockets (which adds additional PCIe lanes as well as memory bandwidth), you will need PCIe switches for dense NVMe drive configurations. When looking to deploy vSAN hosts with more than 10 drives per Intel CPU, be aware that you will need to use certified vSAN ReadyNode designs to accommodate for the included PCIe switches that will be required for these configurations to work.

Be sure to ask your VAR or server OEM, if a PCIe switch will be required for dense NVMe configurations. Do note, that less dense configurations (Example 2 NVMe cache devices and 8 SAS capacity devices) would not need a PCIe switch as the 8 lanes that would be used by these 2 NVMe drives would not necessitate oversubscription. Also when choosing chassis for future expandability.

PCIe switches are less common on AMD systems as the current Rome and Epyc 7002 generation processors support 128 PCIe 4.0 lanes which provide enough lanes.

What about SAS Expanders?

SAS expanders are supported only on vSAN ready nodes. vSAN Ready Node is a validated server configuration in a tested, certified hardware form factor for Virtual SAN deployment. Ready node configurations can be found here. In this example, we can see that 24 drives are supported on this Dell R730XD configuration.SAS expander support is limited due to the significant amount of extended validation that is required to make sure they do not inhibit performance or availability. A typical mini-SAS (SAS-8087) physical connection has 4 SAS channels. Most internal HBA and controllers only have 2 mini-SAS physical connections and in a directly connected configuration only support 8 drives. SAS expanders switch the connection, allowing up to 254 per connection. The SAS expander must work tightly with the raid controller (both are often made by the same manufacturer) and firmware and driver versions for both must be kept in “sync” to prevent issues. SAS expanders also support SATA Tunneling Protocol (STP) that allows a SATA drive to emulate a SCSI device. The VSAN configuration guide (VCG) team works to identify and validate configurations. They work with the vendors to establish supported stable driver/firmware combinations. Flash devices and mixed SATA/SAS environments put a lot of stress on the expanders and controllers and extensive testing is done to identify stable performant configurations. For best results consider SAS over SATA when using more than 8 devices.
So what if you want to “build your own” VSAN server today, and want to use more than 8 devices? The simple and easy way is to purchase a SAS pass-through HBA for each drive group.  This has a number of advantages over SAS expanders.

  • Dedicating a SAS HBA to each disk group breaks up internal fault domains, and reduces the impact of a failed controller.
  • Dedicating a SAS HBA to each disk group increases the available queue depth and throughput for performance and faster rebuilds. We previously covered on the blog that this is a great way to linearly “scale up” performance inside each host. For all-flash configurations, this is especially important.
  • Dedicating a SAS HBA to each disk group reduces the amount of firmware/drivers that must be tracked and accounted for in updates. The vSAN health service can help with this task.

Another question that comes up is using external storage expansion. The compute module and an external storage module are certified as a whole VMware vSAN ReadyNode.  This is primarily for blade or modular systems that lack a number of internal drive bays. As with the SAS expanders, pay attention to the number of drives supported.

Examples of this include:

  • AF-8: DELL FX2 FC630 – A modular server/blade hybrid solution with incredible density can leverage FD332 systems for expansion.
  • HPE Synergy
  • Dell MX MX5016s Storage Sled provides up to 16 extra SAS drives

VSAN offers extensive flexibility and performance across a huge number of server and storage device vendors. SAS expanders, and pass through HBA’s allow for flexibility incapacity, and performance when designing. The VSAN ready nodes act as a great simple quick place to start when designing a VSAN solution.