VMware

10GigE Networking Performance with ESX 3.5 | Main | Network Troubleshooting on VI3

December 10, 2008

Beaconing Demystified: Using Beaconing to Detect Link Failures

Beaconing is one of those features that often confuses even the most experienced networking admin.

 

Shudong Zhou, one of our senior engineers, recently posted an entry on the internal blog explaining how it works and how you might use it. He gave me permission to cut and paste his entry. Here it is …

 

Beaconing is a software solution for detecting link failures downstream from the physical switch.ESX provides a simple and elegant teaming solution. All uplinks connected to a vswitch are assumed to connect to the same physical network (same broadcast domain) so they are all equivalent. Users can configure a list of active and standby uplinks for traffic to go out of the ESX host. If a link fails, the adapter driver detects it and marks the uplink as failed and stops using this uplink. Existing traffic will fail over to a standby uplink or redistributed to the remaining team members.

If a downstream link beyond the immediate physical port fails, the adapter driver obviously cannot detect it. This causes existing VMs using the uplink to lost network connectivity. The proper way to solve this problem is to enable Link State Tracking on the physical switch so that the adapter driver can see the failure. If the physical switch does not support Link State Tracking, beaconing provides a software alternative. Beaconing works as follows:

ESX periodically broadcast beacon packets out of all uplinks in a team. The physical switch is expected to forward all packets to other ports on the same broadcast domain. Hence, a team member is expected to see beacon packets from other team members. If an uplink fails to receive any beacon packets (actually missing 3 consecutive packets), we mark it bad. The failure can be due to the immediate link or a downstream link. With 3 or more uplinks in a team, we can pin point failures of a single uplink. With 2 uplinks in a team, we can detect downstream link failure, but we don't know which one is good and which bad.

ESX behavior when a beaconing failure is detected is as follows:

  1. If two or more uplinks receive beacons from each other, those uplinks are considered good. We stop using uplinks which do not receive any beacon packets.
  2. On ESX 3.5, if no uplink receives beacon packets, traffic is sent to all uplinks (shotgun mode). If a team has two uplinks, any link failure will result in all packets being sent to both uplinks.
  3. On a future edition of ESX, we intend to make an additional improvement. If no uplink receives beacon packets, traffic is only sent to uplinks whose link status is “up”. If a team has two uplinks and one uplink experiences a failure in its immediate link, traffic will be sent out to the other uplink. This saves some CPU cycles.

When should one enable beaconing? When you are concerned that downstream link failures may impact availability and there is no Link State Tracking on the physical switch. Ideally, you should have 3 or more uplinks in the team (active + standby). But you can enable beaconing with 2 uplinks. Some customers don't like the shotgun mode on failure (see #2 above), that's a trade off you should make against some VM losing connection right away.

 

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341c328153ef010536523504970b

Listed below are links to weblogs that reference Beaconing Demystified: Using Beaconing to Detect Link Failures :

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Thanks.

Can you elaborate on what kind of broadcast packets are being used as beacons (Protocol being used? etc.)

The beacon packets use a private format containing the hostid and adapater name of the packet origin. We faked ethernet type in different ways in the past, but we have obtained a standard type recently. Details will be available when the standard type is incorporated into the product.

great clarification of an often misunderstood setting!

http://www.vmwarescripting.com/

Great write up. Any chance the standard method you mentioned will make it into vSphere GA?

Carl, not sure what is meant by the standard method. If it's about #3, only sending packets to links that's "up" when beacon fails, the improvement is in vSphere 4.0 GA.

If it's about the standard ethernet packet type mentioned in my comment, it's not in vSphere 4.0 GA.

Yes that was a question as to how vSphere handles beaconing, so the third example you gave. Thanks again for the clarification!

What occurs in the case with multiple VLANs on a vSwitch? Is only one VLAN used for probing or is the packet sent out on every vlan?

Beacon packets are sent for each vlan configured. A failure on one vlan only impacts portgroups with that vlan. We don't probe vlans for guest vlan tagging because probing 4k vlans results in too much broadcast traffic.

Post a comment

If you have a TypeKey or TypePad account, please Sign In


Categories