Beaconing Demystified: Using Beaconing to Detect Link Failures
Beaconing is one of those features that often confuses even
the most experienced networking admin.
Shudong Zhou, one of our senior engineers, recently posted
an entry on the internal blog explaining how it works and how you might use it.
He gave me permission to cut and paste his entry. Here it is …
Beaconing is a
software solution for detecting link failures downstream from the physical
switch.ESX provides a simple and elegant teaming solution. All uplinks
connected to a vswitch are assumed to connect to the same physical network
(same broadcast domain) so they are all equivalent. Users can configure a list
of active and standby uplinks for traffic to go out of the ESX host. If a link
fails, the adapter driver detects it and marks the uplink as failed and stops
using this uplink. Existing traffic will fail over to a standby uplink or
redistributed to the remaining team members.
If a downstream
link beyond the immediate physical port fails, the adapter driver obviously cannot
detect it. This causes existing VMs using the uplink to lost network
connectivity. The proper way to solve this problem is to enable Link State Tracking on the physical switch so that
the adapter driver can see the failure. If the physical switch does not support
Link State Tracking, beaconing provides a software alternative. Beaconing
works as follows:
ESX
periodically broadcast beacon packets out of all uplinks in a team. The
physical switch is expected to forward all packets to other ports on the same
broadcast domain. Hence, a team member is expected to see beacon packets from
other team members. If an uplink fails to receive any beacon packets (actually
missing 3 consecutive packets), we mark it bad. The failure can be due to the
immediate link or a downstream link. With 3 or more uplinks in a team, we can
pin point failures of a single uplink. With 2 uplinks in a team, we can detect
downstream link failure, but we don't know which one is good and which bad.
ESX
behavior when a beaconing failure is detected is as follows:
- If two or more uplinks receive beacons from each other, those uplinks are considered good. We stop using uplinks which do not receive any beacon packets.
- On ESX 3.5, if
no uplink receives beacon packets, traffic is sent to all uplinks (shotgun
mode). If a team has two uplinks, any link failure will result in all packets
being sent to both uplinks.
- On a future
edition of ESX, we intend to make an additional improvement. If no uplink
receives beacon packets, traffic is only sent to uplinks whose link status is
“up”. If a team has two uplinks and one uplink experiences a failure in its
immediate link, traffic will be sent out to the other uplink. This saves some
CPU cycles.
When should one
enable beaconing? When you are concerned that downstream link failures may impact availability and there is no
Link State Tracking on the physical switch. Ideally, you should have 3 or more
uplinks in the team (active + standby). But you can enable beaconing with 2
uplinks. Some customers don't like the shotgun mode on failure (see #2 above),
that's a trade off you should make against some VM losing connection right away.
Thanks.
Can you elaborate on what kind of broadcast packets are being used as beacons (Protocol being used? etc.)
Posted by: Evan C. | December 19, 2008 at 02:37 AM
The beacon packets use a private format containing the hostid and adapater name of the packet origin. We faked ethernet type in different ways in the past, but we have obtained a standard type recently. Details will be available when the standard type is incorporated into the product.
Posted by: Shudong Zhou | January 08, 2009 at 09:47 AM
great clarification of an often misunderstood setting!
http://www.vmwarescripting.com/
Posted by: Eric | January 13, 2009 at 07:34 AM
Great write up. Any chance the standard method you mentioned will make it into vSphere GA?
Posted by: Carl Skow | May 14, 2009 at 07:59 AM
Carl, not sure what is meant by the standard method. If it's about #3, only sending packets to links that's "up" when beacon fails, the improvement is in vSphere 4.0 GA.
Posted by: Shudong Zhou | May 14, 2009 at 04:34 PM
If it's about the standard ethernet packet type mentioned in my comment, it's not in vSphere 4.0 GA.
Posted by: Shudong Zhou | May 14, 2009 at 04:37 PM
Yes that was a question as to how vSphere handles beaconing, so the third example you gave. Thanks again for the clarification!
Posted by: Carl Skow | May 28, 2009 at 02:25 PM
What occurs in the case with multiple VLANs on a vSwitch? Is only one VLAN used for probing or is the packet sent out on every vlan?
Posted by: Lou | July 04, 2009 at 02:04 AM
Beacon packets are sent for each vlan configured. A failure on one vlan only impacts portgroups with that vlan. We don't probe vlans for guest vlan tagging because probing 4k vlans results in too much broadcast traffic.
Posted by: Shudong Zhou | July 06, 2009 at 02:59 PM
Great article thanks! We're using this to great effect in ESXi4, but have one small problem...
It's nearly impossible to actually tell when beaconing has detected a failure because it's so seamless. The perfect command would be vicfg-vswitch, but unfortunately, this doesn't report beacon failures - just configuration.
Has anyone got any magical suggestions?
Posted by: Daern | August 24, 2009 at 09:26 AM
Yes, observability is hole in the beaconing story. The only way to figure out is by looking at vmkernel log file for beacon failure messages. Will look at potential enhancements.
Posted by: Shudong Zhou | August 24, 2009 at 02:54 PM
I have a problem with beacon probing setting.
when I packet monter in the phsical switch down link port that connected vmnic, I can't get any beacon packet. so traffic is sent to all uplinks(shotgun mode). would you please explane why vswitch not send beacon packet!!
Posted by: Jaieun Chu | December 16, 2009 at 01:13 AM
The only thing I can think of is that certain adapters do not like the beacon packet format and decide to drop them. Please provide your ESX version, network adapter model, and physical switch info to VMware support for further investigation.
Posted by: Shudong Zhou | December 16, 2009 at 12:17 PM
Recently I've seen a case where two different ESX hosts were sourcing ethernet frames with the same MAC address. The result was "address flapping between ports" messages from the switch.
The mac address is not associated with any vm guest system.
The frames were not IP (according to the switch).
The mac address in question appeared on all vlans.
There was not a misconfiguration related to pNIC load balancing. It was set to "source port hash" or somesuch.
Could these have been beacon probes?
What MAC address are beacon probes sourced from?
Thanks!
Posted by: chris | January 20, 2010 at 09:11 AM
The beacon probes are sent on all vlans configured and use the mac address of the physical nic as the source address. The frames use the ethertype 0x05ff
Posted by: Guy Brunsdon | January 20, 2010 at 09:36 AM
Is there a more updated post or document on beacon probing?
Posted by: mark | October 27, 2010 at 11:24 AM
I came across this blog while looking for a solution to address upstream traffic issues affecting our VMs. Here's a detailed walk-through of how we solved it. Hope it helps!
VMware & Link-State Tracking
http://www.bctechnet.com/?p=10
Posted by: Chris | February 10, 2011 at 10:44 AM