Product Announcements

Beaconing Demystified: Using Beaconing to Detect Link Failures

Beaconing is one of those features that often confuses even
the most experienced networking admin.


Shudong Zhou, one of our senior engineers, recently posted
an entry on the internal blog explaining how it works and how you might use it.
He gave me permission to cut and paste his entry. Here it is …


Beaconing is a
software solution for detecting link failures downstream from the physical
switch.ESX provides a simple and elegant teaming solution. All uplinks
connected to a vswitch are assumed to connect to the same physical network
(same broadcast domain) so they are all equivalent. Users can configure a list
of active and standby uplinks for traffic to go out of the ESX host. If a link
fails, the adapter driver detects it and marks the uplink as failed and stops
using this uplink. Existing traffic will fail over to a standby uplink or
redistributed to the remaining team members.

If a downstream
link beyond the immediate physical port fails, the adapter driver obviously cannot
detect it. This causes existing VMs using the uplink to lost network
connectivity. The proper way to solve this problem is to enable Link State Tracking on the physical switch so that
the adapter driver can see the failure. If the physical switch does not support
Link State Tracking, beaconing provides a software alternative. Beaconing
works as follows:

periodically broadcast beacon packets out of all uplinks in a team. The
physical switch is expected to forward all packets to other ports on the same
broadcast domain. Hence, a team member is expected to see beacon packets from
other team members. If an uplink fails to receive any beacon packets (actually
missing 3 consecutive packets), we mark it bad. The failure can be due to the
immediate link or a downstream link. With 3 or more uplinks in a team, we can
pin point failures of a single uplink. With 2 uplinks in a team, we can detect
downstream link failure, but we don't know which one is good and which bad.

behavior when a beaconing failure is detected is as follows:

  1. If two or more
    uplinks receive beacons from each other, those uplinks are considered good.
    We stop
    using uplinks which do not receive any beacon packets.
  2. On ESX 3.5, if
    no uplink receives beacon packets, traffic is sent to all uplinks (shotgun
    mode). If a team has two uplinks, any link failure will result in all packets
    being sent to both uplinks.
  3. On a future
    edition of ESX, we intend to make an additional improvement. If no uplink
    receives beacon packets, traffic is only sent to uplinks whose link status is
    “up”. If a team has two uplinks and one uplink experiences a failure in its
    immediate link, traffic will be sent out to the other uplink. This saves some
    CPU cycles.

When should one
enable beaconing? When you are concerned that downstream link failures may impact availability and there is no
Link State Tracking on the physical switch. Ideally, you should have 3 or more
uplinks in the team (active + standby). But you can enable beaconing with 2
uplinks. Some customers don't like the shotgun mode on failure (see #2 above),
that's a trade off you should make against some VM losing connection right away.