Uncategorized

Measuring Cluster Reconfiguration with VMmark

In my previous blog entry about running VMmark within a four-server cluster managed by VMware Infrastructure 3 version 3.5 (VI3.5), the results showed that a cluster of four servers exceeded the available CPU resources when running 17 VMmark tiles (102 total virtual servers). The scaling then plateaus due to CPU saturation. I suppose that if one of VMware’s customers were in this situation it could be a good thing since it means their business is successful and growing beyond their current computing infrastructure. The real issue becomes how to add capacity as quickly and painlessly as possible to meet the needs of the business. This is an arena where VI3.5 shines with its ability to simply add additional physical hosts on the fly without interruption to the virtual servers.

Experimental Setup

We can easily demonstrate the benefit of adding additional physical resources using the experimental setup described in the previous blog posting with the addition of a second HP DL380G5 configured identically to the one already in use.

We can also take advantage of the underlying VMmark scoring methodology. A VMmark run is three hours long and consists of a half-hour ramp-up period followed by a two-hour measurement interval and a half-hour ramp-down period. The two-hour measurement interval is further divided into three 40-minute periods (just like a marathon hockey game). Every benchmark workload generates its throughput metric during each 40-minute period. For each 40-minute period, the workload metrics are aggregated into an overall throughput measure. The final VMmark score is defined as the median score of the three periods.

For the purpose of these experiments, we can compare the throughput scaling while varying the configuration of the cluster during the three 40-minute scoring periods of the benchmark. Specifically, the original four-node cluster configuration is used during the first 40-minute period. The additional HP DL380G5 server is added to the cluster at the transition between the first and second periods. During the second period VMware’s Dynamic Resource Scheduler (DRS) will rebalanced the cluster. The third period should exhibit the full benefit of the dynamically added fifth server.

Experimental Results

Figure 1 compares the 17-tile throughput scaling achieved both with the default four-node cluster and when augmenting the cluster with a fifth server during the second period (this configuration is labeled “4node + 1” to distinguish it from a configuration that starts with five nodes). As expected, both configurations exhibit similar scaling during Period 0 when four servers are in use and the CPU resources are fully utilized. However, cluster performance improves during Period 1 with the addition of a fifth server. By Period 2, DRS has rebalanced the benchmark workloads and given them some breathing room to achieve a perfect 17x scaling for 17 tiles from the baseline performance of a single tile. In comparison, the CPU-saturated four-node configuration achieved roughly 16x scaling.

Cluster_blog2_fig1_cap

Figure 2 shows the results for both 18-tile and 19-tile tests. This data follows the same pattern as the initial 17-tile experiment. With the addition of the fifth server, near-linear scaling has been achieved in both cases by Period 2 in contrast to the flat profile measured in the CPU-saturated four-node cluster.

Cluster_blog2_fig2_cap

Figure 3 contains results from three different 20-tile experiments. As usual, the first experiment utilizes the default four-node cluster and exhibits throughput scaling similar to the other CPU-saturated tests. The second experiment shows the behavior when adding a fifth server during the transition between Period 0 and Period 1. It displays the rising performance characteristic of the relieved CPU bottleneck. The final experiment in this series was run using the five-node cluster from start to finish and demonstrates that dynamically relieving the CPU bottleneck produces the same throughput performance as if the bottleneck had never existed. In other words, DRS functions equally well both when balancing resources in a dynamic and heavily-utilized scenario and when beginning from a clean slate.

Cluster_blog2_fig3_cap_2

Swapping Resources Dynamically

The ability of VI3.5 to dynamically reallocate physical resources with zero downtime and without interruption can also be used to remove or replace physical hosts. This greatly simplifies routine maintenance, physical host upgrades, and other tasks. We can easily measure the performance implications of swapping physical hosts with the same methodology used above. In this case, we place the Sun x4150 into Maintenance Mode after the fifth physical host (an HP DL380G5) is added. This will evacuate the virtual machines from the Sun x4150, making it available for either hardware or software upgrades.

Figure 4 shows the scaling results of swapping hosts while running both 17-tile and 18-tile VMmark tests on the cluster. In both cases, the performance improves by a small amount. The HP DL380G5 in this experiment happens to contain faster CPUs than the Sun x4150 we had in our lab (Intel Xeon X5460 vs. Intel Xeon x5355), though the Sun x4150 is also available with Intel Xeon x5460 CPUs. These results clearly demonstrate that the liberating flexibility of VMware’s VMotion and DRS comes without performance penalties.

Cluster_blog2_fig4_cap

The Big Picture

I think some of the context provided in my previous blog bears repeating. Let’s take a step back and talk about what has been accomplished on this relatively modest cluster by running 17 to 20 VMmark tiles (102 to 120 server VMs). That translates into simultaneously:

  • Supporting 17,000 to 20,000 Exchange 2003 mail users
  • Sustaining more that 35,000 database transactions per minute using MySQL/SysBench
  • Driving more than 350 MB/s of disk IO
  • Serving more than 30,000 web pages each minute
  • Running 17 to 20 Java middle-tier servers

For all of these load levels, we dynamically added CPU resources and relieved CPU resource bottlenecks transparently to the virtual machines. They just ran faster. We also transparently swapped physical hosts while the CPU resources were fully saturated without affecting the performance of the workload virtual machines. VI 3.5 lets you easily add and remove physical hosts while it takes care of managing the load. You can run any mix of applications within the virtual machines and VI 3.5 will transparently balance the resources to achieve near-optimal performance. Our experiments ran these systems past the point they were completely maxed out, and I suspect that living this close to the edge is more than most customers will attempt. But I am certain that they will find it reassuring to know that VI3.5 is up to the task.