vSAN

Goodbye Multicast

vSAN026efecd-f2ee-4553-940d-1ce437216b5d

I’ll start off with a simple statement that will likely make a lot of customers happy. VMware vSAN in 6.6 will no longer use Multicast. All vSAN traffic will instead use Unicast.

For a history of why Multicast was used, as well as what changed to prompt the shift read the history of vSAN multicast on storagehub. As vSAN technology and hardware have advanced we no longer need to use it for efficiency, or cluster discovery reasons.

Upgrade Process

For existing customers who upgrade to vSAN 6.6 we will continue to use Multicast until the last host in the cluster is on 6.6. Unicast will support 64 node clusters, and no exceptions or corner cases require continuing to use Multicast going forward.

Expected Overhead Changes

Added networking overhead is minimal under normal networking operations (~2Kbps per host). During a cluster repair operations unicast traffic will be higher, but in this situation storage traffic will be reduced, therefore no expected net-net change in traffic. Additional CPU overhead for meta data checks during a cluster partition merge has been mitigated through other improvements to vSAN.

Cluster Discovery Changes

Since cluster discovery was previously done by multicast, a new method is required. Initial cluster discovery can be handled in one of two ways.

By vCenter. The vCenter Server will update all hosts in the event a new host is added to the cluster.  The new Easy Install Wizard will also allow the bootstrapping of a single node vSAN.

 By CLI. If the vCenter Server is offline and you wish to change the vSAN VMkernel ports IP addresses the following script can be used to make hosts aware of each other within it’s cluster.  esxcli vsan cluster unicastagent add –a <ip> -u <uuid> -U 1 –t <witness/node>

You would run this command for each host to make it aware of all of the other hosts.  Note this is not needed if the vCenter is involved in the changing of the host IP addresses. This command is not necessary if a cluster was previously connected and was simply powered off. This is only necessary if you need to add a host without vCenter.

DHCP Support Considerations

As updates to IP addresses must be manged either by vCenter directly on the host some changes for DHCP support are included with the change to Unicast.

Unsupported

At this time DHCP is not supported for vSAN VMkernel Ports when the vCenter is stored on top of the vSAN cluster. This could create a cluster partition that would need to be manually recovered from.

Supported

DHCP with Reservations (So IP addresses never change as they are bound to the MAC Addresses of the Vmkernel ports) is supported.

DHCP without reservations and vCenter outside of the cluster is also supported, but note that an extended vCenter outage could impact cluster availability.

vCenter Recovery Steps

During normal operation, the vCenter state is now regarded as the primary source of truth.  ESXi hosts are automatically updated with the latest host membership list coming from the vCenter Server. When a vCenter is replaced or recovered from backup however, vCenter host membership list may be out of date. It might also be out-of-date if changes to vSAN cluster membership were done using manual “esxcli” command on the vSAN hosts while vCenter was offline.

During a vCenter replacement users typically add hosts back one-by-one. If vCenter were to push such intermediate state (e.g. 3 hosts added) to a running (e.g. 8 node) cluster, then vSAN data path would create a network partition, leading to interruptions to storage IO.

To resolve this issue a means of suppressing the vCenter server as source of truth was created in this corner case recovery situation. This option needs to be set on every vsan host, before recovering a vCenter (especially if vCenter is hosted on vSAN). This configuration option needs to be unset once the user is sure that the vCenter state is updated.

The config option to set during the recovery process is as follows:

esxcfg-advcfg -s 1 /VSAN/IgnoreClusterMemberListUpdates

To reset the config option run the following command on all VSAN hosts.

esxcfg-advcfg -s 0 /VSAN/IgnoreClusterMemberListUpdates

In the event of a vCenter failure, hosts will remember their cluster members, even following a host reboot. While vCenter acts as a primary source of truth, each host retains survivability in it’s absence as well as manageability using the CLI.

 

Frequently Asked Questions

 

What if I want to remove a host without the use of vCenter, or list vSAN nodes that a host is aware of?

You can remove a host using the following command

esxcli vsan cluster unicastagent remove -t node -u <uuid> -a <ip>

You can perform discovery of local neighbors with the following command.

esxcli vsan cluster unicastagent list

 

How Much extra network overhead will be needed

No additional network bandwidth will be required. During a cluster repair operations unicast traffic will be higher, but in this situation storage traffic will be reduced, therefore no expected net-net change in traffic.

Are there any new ports I need to open?

No.

How do I know if I am using Multicast or Unicast Mode?

Within the UI, select your vSAN Cluster, then the manage tab and you can see in the picture below where it reports Unicast or Multicast.

From the CLI: Esxcli vsan cluster gete6e74c3b-c5b1-44a5-a961-0e2813e9f37d

Can I choose to remain on Multicast if I wish?

No, once a cluster is moved to 6.6 it will remain unicast.

Will Unicast support 64 node clusters? 

 64 node clusters are supported in Unicast mode.

Do I need to do anything specific to move to Unicast? 

No simply upgrade to vSAN 6.6 for all hosts in the cluster.

What happens to the Multicast tests in the health UI?

They are no longer needed and will be removed once a cluster moves to using Unicast