When you bootstrap a Kubernetes cluster in a non-cloud environment, one of the first hurdles to overcome is how to provision the kube-apiserver load balancer. If you are running a non-HA single node control plane cluster, the load balancer is unnecessary because all API requests are directly routed to the single control plane. In highly available configurations, a load balancer must sit in front of the kube-apiservers to correctly route requests to healthy API servers.
With cloud platforms such as AWS, it’s trivial to click a few buttons and launch an elastic load balancer. Outside of these platforms, the solution is less obvious. There are, however, load balancing options that can be deployed in non-cloud environments.
First, let’s review why the kube-apiserver load balancer is necessary.
As seen above, the load balancer routes traffic to the kube-apiservers. If a kube-apiserver goes down, the load balancer routes traffic around this failure.
Worker nodes communicate with the control plane through a single API endpoint. Using a load balancer for the endpoint ensures that API requests are properly distributed to a healthy kube-apiserver. If there were no load balancer in place, each worker would need to choose a specific kube-apiserver to communicate with. If this kube-apiserver were to fail, it would cause cascading failures to the bound worker nodes, which is the opposite of high availability.
In the rest of this blog post, we’ll discuss several options for implementing a kube-apiserver load balancer for an on-premises cluster, including an option for those running Kubernetes on VMware vSphere.
DNS for Load Balancing
A common scenario is to use round-robin DNS as a load balancer. This method carries several disadvantages. The lack of health checks prevents routing around failed servers. Unpredictable caching in the DNS hierarchy, as well as client-side, make management and updates difficult. Because of these drawbacks, there are better options to explore.
Option One: Standalone HAProxy
HAProxy is a quick and easy option to use. After installing the package on your load balancing server, you configure the list of kube-apiservers along with their health checks. Here’s an example configuration that balances three kube-apiservers, 10.10.10.10, 10.10.10.11 and 10.10.10.12.
global
log /dev/loglocal0
log /dev/loglocal1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
stats timeout 30s
user haproxy
group haproxy
daemon
# Default SSL material locations
ca-base /etc/ssl/certs
crt-base /etc/ssl/private
# Default ciphers to use on SSL-enabled listening sockets.
# For more information, see ciphers(1SSL). This list is from:
# https://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/
# An alternative list with additional directives can be obtained from
# https://mozilla.github.io/server-side-tls/ssl-config-generator/?server=haproxy
ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS
ssl-default-bind-options no-sslv3
defaults
logglobal
modehttp
optionhttplog
optiondontlognull
timeout connect 5000
timeout client 50000
timeout server 50000
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http
frontend k8s-api
bind 0.0.0.0:6443
mode tcp
option tcplog
default_backend k8s-api
backend k8s-api
mode tcp
option tcplog
option tcp-check
balance roundrobin
default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100
server apiserver1 10.10.10.10:6443 check
server apiserver2 10.10.10.11:6443 check
server apiserver3 10.10.10.12:6443 check
A drawback of this option is that it makes the instance running HAProxy a single point of failure. The whole point of high availability is to remove the single point of failure, not introduce one. Other than quick lab clusters or clusters for proof of concepts, this approach is not recommended.
Option Two: The Keepalived Package with HAProxy
Keepalived is a powerful package that leverages the Linux kernel feature of floating IP addresses through Virtual Router Redundancy Protocol (VRRP). Two instances of HAProxy are launched, a primary instance and a standby instance. If the primary instance fails, Keepalived moves, or “floats,” the IP address to the standby, and no service disruption will occur.
Another interesting way to leverage VRRP is to run it inside a Docker container. These containers can be run as static pods paired with HAProxy, or even directly on the control plane nodes themselves. By using static pods, you can benefit from maintaining your load balancing solution with Kubernetes manifests, just like you do with your workloads.
The drawbacks with Keepalived are that you still need to choose and maintain the actual load balancing software. Keepalived provides only the floating IP address functionality.
Option Three: vSphere HA
Leveraging the powerful high availability features of VMware vSphere High Availability is yet another option. The HA setting can be enabled on the cluster level as shown below.
If a physical ESXi host fails, vSphere HA automatically restarts the HAProxy VM on a healthy ESXi host within the cluster, while maintaining the same IP address.
A great thing about the vSphere HA feature is that it works on the VM level so it’s agnostic to what software inside. There is also no need for specialized configuration because vSphere handles it for us.
Note that you must use a shared datastore for VMs to successfully float between hosts. VMware vSAN is a great choice here, but external options such as iSCSI will also work.
A downside is the speed of the operation. If the HAProxy VM is running on a host that experiences an immediate failure, there will be a delay due to the time needed for the failure detection and VM boot time on a healthy host. The HAProxy endpoint will be non-responsive until the operation completes.
It’s possible to prevent the downtime period by enabling vSphere Fault Tolerance (FT) on the HAProxy VM. In this case, a secondary “shadow” VM runs on a separate ESXi host. The VM is constantly replicated from the primary VM over the network. If the primary VM or its host fails, the IP address will instantly float to the secondary VM. In this scenario, no downtime will be observed during the failover process.
Bonus Option: VMware NSX-T
NSX-T is an extremely powerful, fully featured network virtualization platform that includes built-in load balancing. VMware Enterprise PKS leverages it out of the box, but it’s possible to install and configure it for VMware Essential PKS as well. An added benefit of using NSX-T load balancers is the ability to be deployed in server pools that distribute requests among multiple ESXi hosts. In this scenario, there would be no downtime if an individual host failed.
Conclusion
As shown above, there are multiple load balancing options for deploying a Kubernetes cluster on premises. Here’s a quick summary of each option’s main advantage:
- For a quick POC, the simplicity of HAProxy can’t be beat.
- For a highly available setup on bare metal, using HAProxy with Keepalived is a reliable option.
- On vSphere, you can take advantage of vSphere HA and combine it with a shared datastore to run your load balancing VMs.
- If NSX-T is available in your cluster, load balancers can easily be created with a click of a button.