Weathervane 2.0: An Application-Level Performance Benchmark for Kubernetes

Weathervane 2.0 lets you evaluate and compare the performance characteristics of on-premises and cloud-based Kubernetes clusters.

Kubernetes simplifies the deployment and management of applications on a wide range of infrastructure solutions. One cost of this simplification is a new set of decisions that you must make when acquiring or configuring Kubernetes clusters, including selecting the underlying infrastructure, configuring network and storage layers, sizing compute nodes, etc. Each choice can impact the performance of applications deployed on the cluster. Weathervane 2.0 helps you understand this impact, by:

Comparing the performance of Kubernetes clusters
Evaluating the impact of configuration decisions on cluster performance
Validating a new cluster before putting it into production

When using Weathervane, you only need to build a set of container images, edit a configuration file, and start the benchmark. Weathervane manages the deployment, execution, and tear-down of all benchmark components on your Kubernetes cluster.

Weathervane 2.0 is available at https://github.com/vmware/weathervane.

How Weathervane Works

Weathervane measures the performance capabilities of a Kubernetes cluster by deploying one or more instances of a benchmark application on the cluster and then driving a load against those applications. The load is generated by the Weathervane workload driver, which also runs on a Kubernetes cluster. The workload driver can run on the same cluster as the application, or on a separate cluster. If you run the workload driver on the same cluster, you can use Kubernetes node labels to isolate the driver and applications onto separate compute nodes.

You can configure Weathervane to generate a steady load using a fixed number of simulated users, or to automatically find the maximum number of users that can be supported on the cluster without violating the workload’s quality-of-service (QoS) requirements.

The score resulting from a run of Weathervane is WvUsers. This metric represents the maximum number of simulated users that could interact with the application instances without violating the QoS requirements. When you compare two Kubernetes clusters, the cluster that supports a higher WvUsers is considered to be the higher performing cluster.

The figure below shows a high-level view of the Weathervane 2.0 components, including the application, the workload driver, and the run harness. The run harness automates the process of executing runs and collecting results. The run harness runs on a client system that supports Docker containers and has connectivity to the Kubernetes clusters.

Weathervane uses a realistic multi-tier web application that includes both stateless and stateful services. You can select from two application deployment sizes. This allows you to choose the size of the deployment based on the size of the cluster under test or the expected usage model of the cluster. Weathervane supports using multiple instances of the application in a run, so you can scale up the load for large clusters.

The Weathervane application consists of several service tiers. The application logic is implemented with stateless Java services running in Tomcat which communicate using REST APIs and RabbitMQ messaging and use Zookeeper for coordination. The backend datastores are implemented using PostgreSQL and Cassandra. The front-end web servers and proxy cache are implemented using Nginx.

Weathervane Examples

In order to clarify the use of Weathervane 2.0, we present two simple usage scenarios. The first compares the performance between Kubernetes clusters on two generations of server hardware. The second explores the performance impact of decisions made when configuring a Kubernetes cluster.

In the first example, we quantify the performance improvement obtained by moving a Kubernetes cluster running on older hardware to servers with a newer generation of processors from the same vendor. The newer generation had twice as many CPU cores, but the cores ran at a lower frequency (2.4GHz versus 3.1GHz). The Kubernetes clusters on the two sets of servers were configured similarly, except the newer servers could run twice as many Kubernetes nodes. We give the complete details of the two clusters at the end of this article.

We completed multiple Weathervane runs on each cluster, using from 1 to 48 application instances on the cluster with the older servers and from 1 to 96 on the upgraded cluster. The CPU utilization of the servers was near 100% when we ran the larger number of instances for each cluster. The application instances used the micro configuration size, which has six pods per instance. As a result, the largest configuration used 96 separate applications with a total of 576 pods. We completed multiple runs in order to understand the performance differences at a range of utilization levels. For example, we wanted to understand whether the high CPU frequency of the older servers would yield better performance when running at low utilizations.

The chart below shows the performance results. The cluster with the newer servers outperformed the older servers by 15% to 29% on equivalent configurations and was able to achieve more than twice the peak performance.

Comparing two generations of servers — Kubernetes performance when comparing two generations of servers

In the second example, we compared the performance of two Kubernetes pod network configurations. (You can use this same process to compare any cluster configuration choice.) For this comparison, we used the upgraded cluster discussed above with two cluster configurations, one using the Flannel network fabric with the VXLAN backend and the second using Flannel with the host-gw backend. The primary difference between the Flannel backends lies in how traffic is routed between pods on different Kubernetes nodes. More information is available on the Flannel repository (https://github.com/coreos/flannel).

For these tests, we used the small configuration size and completed multiple Weathervane runs with up to 16 application instances. The chart below shows the results of the comparison. There is little difference at lower loads. At higher loads, Flannel/host-gw outperformed Flannel/VXLAN by about 10%. Further investigation uncovered issues with uneven CPU utilization in the VXLAN case due to the large number of interrupts generated by packets arriving on a single UDP port.

Performance results when comparing two Kubernetes pod network configurations: VXLAN and host-gw

These examples show just two of the many ways you can use Weathervane 2.0 to understand and compare the performance of your Kubernetes clusters.

Get Weathervane 2.0

You can download Weathervane 2.0 from the GitHub repository at https://github.com/vmware/weathervane/. There you will also find the user’s guide and other documentation, as well as instructions for contacting the Weathervane team. We welcome feedback, questions, and requests for features, as well as community participation.

Cluster Configuration Details

Server Generation Comparison

	Cluster 1 (Old Generation)	Cluster 2 (New Generation)
Number of Servers	2
CPU Socket per Server	2
CPU Cores/Threads per Server	20/40	40/80
Processor Speed	3.1GHz (Turbo mode enabled)	2.4GHz (Turbo mode enabled)
Total Cores/Threads	40/80	80/160
BIOS Profile	Performance
Number of Kubernetes Nodes (Virtual Machines)	4	8
vCPUs per Kubernetes Node	10
Memory Per Kubernetes Node	112GB	180GB
Kubernetes Version	1.17.0
Pod Network Provider	Flannel with host-gw backend
Kubernetes Node OS	Ubuntu 18.04
Docker Version	18.06.3-ce
Virtual Infrastructure	vSphere 6.7 Update 3
ESXi Host Advanced Tuning	Numa.LocalityWeightActionAffinity=0 (See https://kb.vmware.com/s/article/2097369)

We used additional servers for the workload driver pods.

Flannel Backend Comparison

	Configuration 1	Configuration 2
Number of Servers	2
CPU Socket per Server	2
CPU Cores/Threads per Server	40/80
Processor Speed	2.4GHz (Turbo mode enabled)
Total Cores/Threads	80/160
BIOS Profile	Performance
Number of Kubernetes Nodes (Virtual Machines)	20
vCPUs per Kubernetes Node	4
Memory Per Kubernetes Node	72GB
Kubernetes Version	1.17.3
Pod Network Provider	Flannel/VXLAN backend	Flannel/host-gw backend
Kubernetes Node OS	Ubuntu 18.04
Docker Version	19.03.6-ce
Virtual Infrastructure	vSphere 6.7 Update 3
ESXi Host Advanced Tuning	Numa.LocalityWeightActionAffinity=0 (See https://kb.vmware.com/s/article/2097369)

We used additional servers for the workload driver pods.