Adding a Load Balancing Rule for PXE to Prevent Timeouts and Failures

by: VMware Senior Cloud Infrastructure Administrator Ashu Saxena; VMware Linux Systems Architect Sachin Sharma; and VMware Senior Network Systems Administrator Sukhjit Singh

Load balancing in a global R&D environment is a complex task. On an average day, VMware IT operations receives almost 6,000 ESXi build requests from different R&D teams to do performance testing and quality assurance (QA). During peak periods, we can have thousands of requests for TFTP (trivial file transfer protocol) connections for simple file transfers and hundreds of build requests. It’s not unusual to process 200 simultaneous user requests.

To remotely load ESXi images to partition them into VMs, IT uses the Preboot eXecution Environment (PXE). PXE uses the UDP (user datagram protocol) TFTPD protocol that works on port 69 and random ports above 1024. Because multi-port load balancing uses random port selection, it is impossible to determine the port number and manage these connections.

The Challenge: Eliminate Service Timeouts

With only a single node available to serve a growing number of requests, TFTPD or service timeouts began occurring more frequently. In one extreme case, a user reported that 60 out of 64 builds timed out. The volume of ESXi build requests going through the PXE booting process was overwhelming our available capacity.

We knew we had to re-architect our environment. First, we upgraded from 32-bit to 64-bit PXE servers, which helped performance. Our next step was to test different solutions that could scale.

We considered segmenting the PXE workload. Each site has number of subnets; the site’s global configs point to a single instance of TFTP. Each site’s subnets could potentially be configured to connect with different TFTP servers.  Each subnet can have DHCP option 66 specifying a TFTP server.  However, this solution involved more administrative work. When there are any changes, an administrator has to review all the network ranges or subnets, then manually select and map which TFTP server should be mapped against which subnet.

Another solution we tested was to segment PXE workloads by specifying multiple TFTP servers for option 66. But this approach depended on whether the DHCP clients could interpret multiple servers. Many clients failed to boot while parsing multiple TFTP servers returned by DHCP option 66.

The Solution: Add New Load Balancing Rule

After our evaluation, we decided that site to TFTP was the simplest solution. After several attempts, we wrote a new load balancing rule for the server network that helped us scale, a feat some consider impossible when using the TFTPD protocol.

The new rule maps the server-side session to the client-side session in the load balancer session table. After the server port is changed to a port above 1024 and the connection is made, the server’s local port is identified, and the client is connected to a new port for the TFTP Data exchange. This rule binds the client and server connection to prevent timeouts and boot failures, even when the server is overloaded.

By default, the load balancer rejects UDP datagrams without data block. Hence, TFTP acknowledgments are rejected by the load balancer and TFTP connections fail. As a solution, we enabled a “UDP datagrams with no payload” feature in the UDP settings for the TFTP virtual server.

These changes are currently implemented as part of a load balancing suite for daily QA testing. Timeout occurrences have been reduced to near zero, which means that IT delivery of services for performance and QA testing is much more reliable. We have also been able to increase the number of builds being handled by 4X. We added three servers to accommodate the increased volume. We’ve received positive feedback from our users, whose work is no longer disrupted by timeouts.

One of the advantages of this solution is that it is both dynamic and easy to scale. When we add a new TFTP server into the load balancer server pool, it’s ready to accept new connections without changing any client settings.

There is also no limit on the number of servers we can add to the pool making it completely scalable.  Our biggest bottleneck is resource utilization, or bandwidth. If utilization increases in future, we will be able to upgrade from our current bandwidth. We can also create multiple virtual servers (VIPs) on the same or different load balancers. By creating different pools containing any number of TFTP servers, we can support multiple major networks.

Now that we have successfully implemented this at VMware’s main data center, we will also roll this out to other VMware data centers and labs to further improve their operations.

VMware on VMware blogs are written by IT subject matter experts sharing stories about our digital transformation using VMware products and services in a global production environment. Contact your sales rep or to schedule a briefing on this topic. Visit the VMware on VMware microsite and follow us on Twitter.


2 comments have been added so far

  1. What about iPXE, and use something different than TFTP, for example HTTP?
    ESXi is a bit tricky to boot with iPXE, if I recall correctly, but load balancing HTTP(S) is quite – even more if you consider that iPXE can do DNS requests, so you can even balance on the DNS level.

  2. Yes HTTP is easy to load balance and works good. However standardized PXE clients use TFTP. To enable iPXE we have two options.

    1. NIC’s should be reflashed with iPXE image. That’s with large number of clients. Like we have is very difficult.
    2. Use Chainloading iPXE i.e. Keep iPXE itself in TFTP Server. While client boots, it downloads iPXE from TFTP Server.

    Hence with large number of requests, we need to TFTP Load Balancing for iPXE as well. Since it’s served again from TFTP Server.

Leave a Reply

Your email address will not be published.