posted

3 Comments

vSAN can leverage a number of core ESXi networking features to increase performance and availability.  vSAN VMkernel ports should always be configured to use more than one interface to preserve availability in the event of a link failure. There are a number of ways to configure this fail over.

Which will you choose?

While there quite a few options, it is best to work with a decision framework. Identify what your organization can support, and if the benefits of a more complicated solutions out weigh the operational complexity and risk of deploying it. The following options are organized from least complex to most complex in terms of configuration on both the host and switch.

Active/Standby configuration

Active/Standby

A basic active/standby configuration can be configured to allow vSAN to primarily and use additional interfaces in the event of it's failure.

This configuration has benefits in simplicity of configuration (no additional configuration is needed on the switch). You can stagger vMotion and vSAN on different active uplinks to the opposite port to partially manually balance traffic, but share a port in the event of a failure. This is a safe "default" choice unless you have other throughput considerations pushing you towards some form of link balancing. One unspoken benefit to this configuration is that the deterministic pathing makes it easier to understand what traffic is causing a port to be saturated (although you can easily find this with vSphere performance graphs, vROPS, and other tools like Netflow).

 

Load Based Teaming

screen-shot-2017-01-02-at-4-54-10-pm

Route based on Physical NIC Load works by monitoring vmnic's for 75% saturation or greater, and re-assigning port groups every 30 seconds to try to prevent saturation. Note this is purely reactionary to load, and any one connection will be limited by a single uplink's speed. It should also be noted that this moving can be disruptive if the physical switchport is not configured with portfast or similar settings.

 

IP Hash

screen-shot-2017-01-02-at-5-05-48-pmRoute based on IP Hash works by taking the source and destination IP addresses and performing a mathematical calculation on each packet to determine which uplink in the team to use. Because the load balancing is based on the source/destination IP addresses, a single host communicating with multiple IP addresses can balance its load across all of the network adapters in the team and make better use of the available bandwidth. Note, there is limited intelligence to this balancing, and if a link is saturated, existing communications will not be balanced to less active links. In this way it should be noted that 1+1 does not equal 2, and a 10Gbps is preferred over 10 x 1Gbps links bonded together.  IP Hash is also dependent on switch configuration, and provides fundamentally little performance benefit for vSAN unless you are actively saturating 10Gbps ports due to other traffic classes. Modern distributed switches do have basic configuration validation, but a misconfiguration on either the host or the switch can result in a non-functioning link.

 

 

LAG

LAGLACP enables more advanced hashes that mix in things like source and destination port.  These hashes will allow for potentially balancing of traffic that is split across multiple connection sessions between the same two hosts. Note, you will need to see what your switch supports and even within the same vendor or product family different ASICs may only support specific options.  Unlike Static LACP or Etherchannel configurations, dynamic LACP will only form if the switch configuration has been properly set.