Home > Blogs > VMware VROOM! Blog > Monthly Archives: January 2007

Monthly Archives: January 2007

Trying VMmark on Some New Hardware

I recently received a new HP DL380G5 server sporting the new Intel Woodcrest dual-core processors. I was finally ready to try it out on the Friday before Christmas and I didn’t want to be a Grinch by asking the lab support folks to connect a disk array before the long weekend, so I quickly set up a test using the built-in SAS controller, which supports up to 8 SAS drives. Two of the drives were configured in RAID 1 and held the ESX installation. I configured the six remaining drives into a single RAID 0 LUN and then created three equally sized partitions for three vmfs filesystems, one for each VMmark tile I expected to run. I then ran the benchmark with 1, 2, and 3 tiles. One, two and three tiles consumed roughly 40%, 80%, and 100% of the CPU resources, respectively. The resulting throughput scores are shown in Figure 1.

Figure1x_1 

VMmark was designed to mimic the uneven demands across workloads typical of multi-VM consolidation environments. In general, the way in which each workload performs and scales depends upon the capabilities of the subsystems on which it depends, e.g., disk and CPU for the database server. Analyzing the variations in performance as the underlying hardware components become saturated helps to validate the system configuration as well as the proper behavior of the virtualization layer. Although the scaling at 3 tiles looked decent, I wanted to see why it wasn’t better, so I examined the scores of the individual workloads. The normalized workload scores are shown in Figure 2.

Figure2x_1 

The mail server scales perfectly from 1 to 2 tiles and then improves by another 25%, equal to the 25% increase in CPU utilization, when a third tile is added to fully saturate the CPU on the system. The java server scales perfectly up to 3 tiles. In this case, each java server VM is allocated sufficient CPU shares to achieve full performance and excellent scaling even with 3 tiles due to the think-time constraints built into VMmark. Both the database server and the web server exhibit roughly linear scaling from 1 to 2 tiles. In a less than fully utilized system, both workloads tend to consume more than their guaranteed share of the CPU resources. As the system becomes saturated, it is likely that they will not receive as many CPU resources as they could consume, leading to poorer individual scaling. In this test, the database server gets a reasonable but not perfect boost going to 3 tiles. The web server, which is typically the greedier of the two, actually sees a drop in its overall score with 3 tiles. So far, everything is behaving more or less as expected. The poor fileserver scaling is less intuitive. The aggregate throughput peaks at roughly 35MB/s running a single tile. With two and three VMs, the aggregate throughput drops to 29MB/s and 23MB/s, respectively. I have seen this behavior before on other systems and even discussed it during my VMworld session on VMmark. Examining the esxtop disk statistics shows that the SCSI queue for the LUN is a source of contention among multiple fileserver VMs. I know from experience that simply increasing the queue depth will improve the situation somewhat but a better solution is to put each tile on a separate LUN.

Figure3x_2

In this scheme, the file server VMs are segregated and cannot negatively affect each other when they fill a SCSI queue. In this case, I didn’t have extra disks available, so I deleted the existing 6-disk LUN and replaced it with 3 2-disk LUNs. I then recreated the tiles, one on each LUN. Re-running the tests provided some interesting insight. Figure 3 shows the workload scores using the new disk layout. The comparison of the workload scores obtained using the two different disk layouts is then shown in Figure 4. In the 1-tile case, all of the workload scores except the file server were about equal. However, the fileserver in the 2-disk LUN case only managed 12.5 MB/s, roughly 1/3 of the throughput of the 6-disk LUN. This should be no surprise since we are using 1/3 as many disks.

Figure4x 

For the two-tile case, using separate LUNs yields only a slightly lower overall score, again due largely to the slightly lower fileserver throughput of 26MB/s (vs. 29MB/s with a single LUN). The important thing to note is that the scaling is much better due to the segregation of the fileservers. The payoff comes when 3 tiles are run and the file server VMs continue to achieve excellent scaling. With 3 separate LUNs, the aggregate fileserver throughput is 38MB/s (vs. 23MB/s using a single LUN). The improved fileserver throughput does come at the cost of slightly lower throughputs for the database and web server workloads, since the fileserver VMs are now able to fully utilize their allocated shares of the system. Figure 5 compares the overall VMmark scores of the single-LUN and multi-LUN configurations. We see that the multi-LUN layout is slower for one tile, roughly equal for two tiles and better for three tiles due to improved file server results.

Figure5x 

These two different disk configurations highlight some interesting tradeoffs and tuning opportunities exposed by VMmark. The single LUN configuration utilizing six disks has the benefit of providing high disk throughput for one VM at the expense of scalability if multiple disk-intensive VMs are running. On the other hand, creating multiple LUNs provides both good predictability and excellent scaling but limits the total throughput of any single VM by providing only a subset of the hardware resources to each one. From a benchmarking perspective, the multi-LUN approach is clearly better since it results in a higher overall score. In practice, the proper approach depends upon the needs and goals of each user. I am excited by the ability VMmark gives us to study these types of performance tuning tradeoffs in a representative multi-VM environment. I feel that building performance tuning expertise in these complex situations and getting that information to our customers along with the ability to evaluate hardware and software platforms for virtualization should make VMmark an extremely valuable tool. Please stay tuned as we work to make that a reality.

Performance Tuning Guide for ESX 3

Optimizing ESX’s performance is one of the primary tasks of a system administrator. One wants to make the best use of what ESX can
offer not only in terms of its features but also their associated performance. Over time a number of customers have been asking us for a single comprehensive ESX performance tuning guide that would encompass its CPU, memory, storage, networking, resource management and DRS, component optimizations. Finally we have the Performance Tuning Best Practices for ESX Server 3  guide.

As indicated above this paper provides a list of performance tips that cover the most performance-critical areas of Virtual Infrastructure 3 (VI3). The paper assumes that one has deployed ESX and has a decent  working knowledge of both ESX and its virtualization concepts, and are now looking forward to optimizing its performance.

Some customers will want to carefully benchmark their ESX
installations, as a way to validate their configurations and determine their
sizing requirements. In order to help such customers with a systematic
benchmarking methodology for their virtualized workloads, we’ve added a
section in the paper called "Benchmarking Best Practices". It covers the
precautions that have to be taken and things to be kept in mind during such
benchmarking. We’ve already published a similar benchmarking guidelines
whitepaper for our hosted products Performance Benchmarking Guidelines for VMware Workstation 5.5

The strength of the paper is that it succinctly
(in 22 pages) captures the performance best practices and benchmarking tips
associated with key components. Note that the document does not delve into the
architecture of ESX nor provide specific performance data for the discussions.
It also doesn’t cover sizing guidelines or tuning tips for specific
applications running on ESX.

All of us from the Performance team hope you find the document useful.

Shrinking the VMmark Tile

Our new benchmark, VMmark, had its first Beta release on December 21st. Now we are busy supporting the Beta users as well as trying to address some of the feedback we received during the earlier VMmark technology preview program with some of our hardware partners. We heard from almost everyone that the memory footprint of 7GB per tile should be reduced. (Details on VMmark and its tile definition can be found here: http://www.vmware.com/vmtn/resources/573). Looking at the trends in the mid-range space, the feedback makes sense. Many current two-socket, 4-core systems have only 8 DIMM slots. One would have to break the bank buying 4GB DIMMs to get 8GB/core and 4-core chips are arriving. Ultimately, I hope the hardware vendors add more memory slots to address this looming imbalance. But for now, if we are going to measure these types of systems, we’ll need to reduce the memory usage of VMmark.

Three of the workloads in a VMmark tile, the web, file, and standby servers, together consume only 1GB of memory. They are already pretty lean, so squeezing memory from them would have limited benefits. The remaining three workloads, the database, mail, and java servers, use 2GB each. Databases tend to like a large, well-tuned buffer cache. I’d rather leave that one alone since it is a fairly typical database size. That leaves the java and mail server VMs as candidates. If we cut both of those VMs down to 1GB each, the total memory footprint drops to 5GB. In this configuration, 3 tiles will fit into 16GB, which should max out a current 2-socket, dual-core system using the cheaper 2GB DIMMs while leaving plenty of headroom for quad-core with 4GB DIMMs.

I first made the necessary changes for the mail and java server VMs to use 1GB. I then ran them each in isolation to get an idea of the performance impacts. To my surprise, the java server exhibited only a 2-3% reduction in throughput while mail server showed no discernable difference. Looking back, I suspect that this is due to the workload throttling we implemented in VMmark to insure that the workloads run at less than full utilization as they would in a datacenter consolidation scenario. Given that we initially sized our VMs based upon various industry and customer surveys, I am led to wonder if there aren’t lots of servers over-configured with not only CPU but also memory. As a final series of tests, I reran the newly modified VMmark on several systems for which I already had data for the existing 7GB tile size. Overall I saw very little effect on the benchmark scores. It looks like the 5GB VMmark tile is a go.