The vSphere Distributed Switch (VDS) is a powerful, but often misunderstood technology that is included with VMware vSAN. This post will review some of my favorite settings on the VDS, and how you can use them to get better control, performance, and visibility into your virtual SAN cluster. While it is most known for the ability to create port groups that exist on all hosts with a simple click, it also has a lot of lesser known but incredibly powerful functions that will aid the vSAN administrator.
LLDP/CDP There's a lot to like in the VDS when deploying a new vSAN cluster or trying to identify any issues. My favorite undercooked setting is the ability to configure Link Layer Discovery Protocol (LLDP) and Cisco Discovery Protocol (CDP) for both send and receive. The standard switch is limited to only receiving CDP. What this means is that not only can a host pull information from switches and "identify" what switch/port/vLAN it is plugged into, but also the network administrators can identify which hosts are plugged into which physical switches. Having watched hours of time wasted on "Fixing" the wrong port this small setting makes sure that everyone is on the same page about where things have been plugged in. This helps also find documentation errors, or incorrectly cabled hosts quickly and easily.
NetFlow and IPFIX allow the inspection of IP traffic information by sending them to a remote collector from VMware vRealize Network Insight as well as 3rd parties such as Solarwinds, Netflow Logic and CA. These tools allow you to from an out of band perspective peer into the traffic entering and leaving a virtual machine for monitoring, management and security purposes. For a quick video on how to set it up click here. This helps you understand not only how much traffic, but where it is going.
vSphere Network I/O Control (NIOC) can be used to set quality of service (QoS) for Virtual SAN traffic over the same NIC uplink in a VDS shared by other vSphere traffic types including iSCSI traffic, vMotion traffic, management traffic, vSphere Replication (VR) traffic, NFS traffic, Fault Tolerance (FT) traffic, and virtual machine traffic. General NIOC best practices apply with Virtual SAN traffic in the mix:
- For bandwidth allocation, use “shares” instead of “limits” as the former has greater flexibility for unused capacity redistribution.
- Always assign a reasonably high relative share for the Fault Tolerance resource pool because FT is a very latency-sensitive traffic type.
- Use NIOC together with NIC teaming to maximize network capacity utilization.
- Leverage the VDS Port Group and Traffic Shaping Policy features for additional bandwidth control on different resource pools.
Specifically, for Virtual SAN, we make the following recommendations:
- Do not set a limit on the Virtual SAN traffic; by default, it is unlimited.
- Set a relative share for the Virtual SAN resource pool based on application performance requirements on storage, also holistically taking into account other workloads such as bursty vMotion traffic that is required for business mobility and availability.
- Avoid reservations as they will share unused traffic only with other management types (vMotion, Storage, etc.) but not with Virtual Machine networking needs.
LAG - As I discuss in the "using multiple interfaces" blog post, there are more options for use of multiple interfaces. the VDS supports LAG with a huge variety of advanced hashes, as well as Dynamic LACP which if a switch has been misconfigured will cleanly fail to a working state. For a video of how to configure LAG see this link. While not everyone need LAG if you are going to use it the added functionality for the VDS makes it a better option.
Misconceptions about the VDS
A common concern are around how VDS will behave if vCenter fails. It should be noted that virtual machines using a VDS will continue to run uninterrupted in the event the vCenter is unavailable. You will loose the ability to configure or add Virtual Machines to the switch, but they will continue to operate. If you foresee a position where you may need to add a virtual machine to a management VLAN/PortGroup in the event of a failure you can pre-stage a Ephemeral Port group. This would allow you to rebuild a dependency (such as the vCenter database) that is stored within the environment without having a chicken/egg problem or the need to keep a standard switch for management functions.
Another concern is often tied to licensing as historically the VDS requires enterprise plus licensing. It should be noted that all editions of vSAN include the VDS so there is no excuse to not use it with VSAN!