Home > Blogs > VMware VROOM! Blog > Monthly Archives: November 2011

Monthly Archives: November 2011

SAP Three-Tier Benchmark on vSphere 5 Achieves New Levels of Performance

HP recently published an impressive SAP three-tier Sales and Distribution (SD) benchmark result that was running entirely on vSphere 5. A total of 32,125 SAP SD benchmark users and 170,320 SAPS was achieved using an HP VirtualSystem Solution that was hosting 22 application server VMs and one database server VM.  A "Monster VM" with 20 vCPUs was configured for the Database server VM, which was a key component in enabling the new record setting results.

The configuration used to run the SAP three-tier benchmark is impressive and is an indication of how vSphere is capable of running large enterprise workloads.  HP's SAP three-tier SD benchmark result used 11 servers with a total of 132 cores and 264 logical threads with HT enabled.  The high number of users and large number of systems in this benchmark are representative of a large SAP landscape, which can all be run on vSphere 5.

The HP SAP three-Tier SD benchmark (certification #2011044) was run on Windows Server 2008 R2 with SQL Server 2008 for the database and SAP ERP 6.0 with EHP 4. The servers were HP ProLiant BL460c G7 model with Intel X5675 processors (2P/12C/24T) and 96GB of RAM. More details are available in a performance brief that HP has published and also on the SAP benchmark site (http://www.sap.com/benchmark).

VMware View 5 resource optimization

In last week’s post, we discussed 4 simple settings that we have observed deliver significant resource savings, while preserving user experience for typical desktop users. While we discussed the benefits of each setting in isolation, I just wanted to illustrate the overall gains. For runs using View Planner (which simulates a typical office user, with MS Office apps, browsers, Adobe reader, video playback, photo albums etc – more details can be found here), we observe a significant reduction in bandwidth when these 4 resource control settings are applied in unison:


From the above plot it is apparent that the bandwidth reductions resulting from i) disabling build-to-loss, ii) setting the maximum frame rate to 15, iii) setting maximum audio bandwidth to 100, and iv ) performing simple in-guest operations (such as selecting “optimize for visual performance”  and disabling ClearType) are mainly additive, and the cumulative benefit is pretty substantial – around a 1.8X reduction from the default! [Particularly compelling, given that for typical office users there is very little difference in user experience]

Storage vMotion of a Virualized SQL Server Database

vSphere Storage vMotion (svMotion) enables the live migration of disk files belonging to virtual machines (VMs). svMotion helps to eliminate the down time of the applications running in VMs when the virtual disk  files containing the applications’ data have to be moved between storage devices for the purpose of hardware maintenance, upgrades, load-balancing storage resources, or proactive disaster recovery.

svMotion is the missing piece in liberating VMs and VMs’ associated files completely from the physical hardware on which they reside. Because of the importance of svMotion in the virtual landscape, we at VMware Performance Engineering Labs conducted a study involving the svMotion of the virtual disk files of a VM hosting a large SQL Server database. The focus of the study was to understand:

  • The impact on performance of the SQL Server database when migrating physical files of different database components such as data, index, and log.
  • The effect of the I/O characteristics of the database components on the migration time of the virtual disk containing the files of those components.

The results from the study show:

  • A consistent and predictable disk migration time that was largely influenced by the capabilities of the source and the destination storage hardware.
  • That the I/O characteristics of the database components do influence disk migration time.
  • A 5% to 22% increase, depending on the VM load conditions, in the CPU cost of a transaction of the database workload while migrating a virtual disk containing the physical files of the database.

For more details, refer to the white paper “Storage vMotion of a Virtualized SQL Server Database

Comparing ESXi 4.1 and ESXi 5.0 Scaling Performance

In previous articles on VROOM! we used VMmark 2 to investigate the effects of altering a single hardware component, such as a storage array or server model, in a vSphere cluster. In contrast to these earlier studies, we now examine the effects of upgrading the hosts’ software from ESXi 4.1 to ESXi 5.0 on the performance of a VMmark 2 cluster.

vSphere 5 includes many new features and virtual machine enhancements, the details of which can be found here. To the IT professional weighing the costs and benefits of upgrading their existing infrastructure to vSphere 5, an often important question is whether ESXi 5.0 can outperform ESXi 4.1 in the same environment. VMmark 2 is an ideal tool for answering this question with measurable results. We used VMmark 2.1.1 to see how ESXi 5.0 stacked up to ESXi 4.1 on an identically configured cluster.

VMmark 2 is a multi-host virtualization benchmark that models application performance as well as the effects of common infrastructure operations such as vMotion, Storage vMotion, and virtual machine deployments. Each VMmark tile contains a set of VMs running diverse application workloads as a unit of load. VMmark 2 scores are computed as a weighted average of application workload throughput and infrastructure operation throughput. For more details, see the overview, release notes for VMmark 2.1, and for 2.1.1.

Testing Methodology

All VMmark 2 tests were conducted on a cluster of four identically configured entry-level Dell Power Edge R310 servers. To determine the impact of the vSphere 5 environment on performance, a series of tests was conducted with these hosts running ESXi 4.1, then with ESXi 5.0. In addition, for the vSphere 5 environment, the virtual machine hardware and VMware Tools were upgraded on all workload VMs, and LUNs were reformatted as VMFS5. All other components in the environment remained unchanged during testing.

Systems Under Test: Four Dell PowerEdge R310 Servers
CPUs: One Quad-Core Intel® Xeon® X3460 @ 2.8 GHz, hyper-threading enabled per server
Memory: 32GB DDR3 ECC @ 800 MHz per server
Storage Array: EMC VNX5500
Hypervisors under test:
VMware ESXi 4.1
VMware ESXi 5.0
Virtualization Management: VMware vCenter Server 5.0
VMmark version: 2.1.1


To characterize cluster performance at multiple load levels, we increased the number of tiles until the cluster reached saturation, defined as when the run failed to meet Quality of Service (QoS) requirements. Scaling out the number of tiles until saturation allows us to determine the maximum VMmark 2 load the cluster could support and to compare the ESXi 4.1 and ESXi 5.0 configurations at each level of load.

The graph below shows the results of the VMmark 2 testing as described above with identically configured clusters running ESXi 4.1 and ESXi 5.0. All data points are the mean of three tests in each configuration.



The ESXi 4.1 cluster reached saturation at 3 tiles, but ESXi 5.0 was able to support 4 tiles while still meeting workload Quality of Service requirements. The ESXi 5.0 cluster also outperformed ESXi 4.1 by 3% and 4% on the two and three-tile runs, respectively. Differences in CPU utilization were negligible. The results show that, in an equivalent environment, vSphere 5 handled greater load than ESXi 4.1 before reaching saturation, and showed increased performance at lower levels of load as well. At saturation, vSphere 5 showed a 22% increase in overall VMmark 2 scores over ESXi 4.1. In this cluster, vSphere 5 supported 33% more VMs and twice the number of infrastructure operations while meeting Quality of Service requirements.

VMmark 2 scores are based on application and infrastructure workload throughput, while application latency reflects Quality of Service. For the Mail Server, Olio, and DVD Store 2 workloads, latency is defined as the application’s response time. The completion time for vMotion, Storage vMotion, and VM Deploy is used as the latency measurement for the infrastructure operations. Latency can be very informative about the functioning of the environment and how the cluster as a whole performs under increasing loads. Examining latency at a 3-tile load, as seen in the figure below, reveals significant differences between the hypervisor versions. Latencies were normalized to the ESXi 4.1 results.


We saw decreases in latency for all VMmark 2 workloads with vSphere 5. The latency decreases were most striking in Olio, Storage vMotion, and DVD Store 2, with decreases of 20%, 19%, and 15%, respectively. These improvements to vMotion and Storage vMotion are consistent with publicized improvements in vMotion and Storage vMotion latency for vSphere 5 (details here).

A VMmark 2 run passes when all of its application QoS metrics, or latencies, remain below a specified threshold. These decreases in latency with ESXi 5.0 are directly related to why ESXi 5.0 was able to support an additional tile relative to ESXi 4.1.

Our comparison has shown that upgrading an ESXi 4.1 cluster to vSphere 5 had two high-level effects on performance. The vSphere 5 cluster supported 33% more VMs at saturation than the ESXi 4.1 cluster, and it also exhibited improved latency and throughput at lower levels of load, showing that ESXi 5.0 does outperform ESXi 4.1.

4 simple resource optimizations for VMware View 5

By default the VMware View PCoIP protocol dynamically optimizes for the best user experience for the given resource constraints. In the majority of environments, this is the desired approach. However, there can be times where individual users or group administrators are interested in different resource utilization policies and in past blogs and whitepapers we have discussed in detail how to configure PCoIP to optimize for constrained resource consumption. In this post, I just wanted to provide a concise summary of these recommendations by highlighting 4 simple optimizations that our extensive internal testing has shown yield significant benefits:

  1. Disable build-to-lossless: setting enable_build_to_lossless to 0 delivers about a 1.3X reduction in bandwidth for typical office workloads. And, PCoIP still builds to a high quality lossy image that is virtually indistinguishable from fully lossless for office workloads.
  2. Optimize video frame-rate: setting maximum_frame_rate to 15 reduces video bandwidth by almost 1.7X in many situations, yet continues to deliver a smooth motion experience.
  3. Optimize audio bandwidth: setting audio_bandwidth_limit to 100 reduces audio bandwidth by around 5X, while continuing to deliver good quality sound.
  4. In-guest optimization: setting Windows visual settings to "optimize for performance" reduces bandwidth by over 1.1X for typical office workloads. Additionally, disabling ClearType reduces bandwidth by a further 1.05X. Disabling desktop wallpaper, and setting the screen saver to none, can also deliver bandwidth savings, although the new client image caching support in View 5 often significantly reduces the additional bandwidth traditionally associated with these options. Finally, disabling Windows update, Super-fetch and Windows index significantly reduces redo-log growth, minimizing storage requirements. Full details of in-guest optimizations can be found here.

[N.B. the PCoIP settings can be set via the Windows registry, or via GPO.]

These simple changes significantly decrease bandwidth consumption, increase consolidation ratios, have minimal impact on typical user experiences and represent good defaults in many environments.