Home > Blogs > VMware VROOM! Blog > Monthly Archives: March 2010

Monthly Archives: March 2010

SAP Three-Tier Shows Excellent Scaling on vSphere

There have been many tests published showing great performance of SAP software running on VMware vSphere in a two-tier configuration. Both the application server and database server are running on the same VM in a two-tier configuration which reflects how many customers run SAP solutions in small and mid size configurations. In larger configurations, it becomes necessary to split out the application server and database server onto separate systems. In order to demonstrate how SAP software scales in a three-tier environment running on vSphere, a test was set up with three application server VMs and one database server VM.

Working with the Dell TechCenter lab, a PowerEdge M710 blade server with VMware vSphere 4.0.0 was used to test a set of SAP VMs in a three-tier configuration. The blade server had two quad-core Intel x5570 (Nehalem) processors and 72GB of RAM. Storage was provided by three PS5000XV Dell EqualLogic iSCSI arrays.

Each of the four VMs was configured with 18GB of virtual memory and Microsoft Windows Server 2008 x64 was installed as the guest OS. Microsoft SQL Server 2008 was used as the database and enhancement pack 4 for SAP ERP 6.0 was running on the application server VMs. The database VM also had the SAP application server components installed, but no transactions were directed through it.

Using a well understood SAP transactional workload on this three-tier configuration, three series of tests were run to measure how performance scaled as resources were added to the VMs. A one second response time criteria was used to determine the number of users. The graph below shows how the response time increased as users were added to each of the three configurations tested.


The first test configuration with four 2vCPU VMs achieved 1386 users. This initial test configuration had 8 virtual CPUs, one for each of the 8 physical cores in the server. In order to test the performance scaling when using SMT, the second configuration used a 4vCPU configuration for the VMs, allowing all 16 logical threads of the server to be used, and achieved a 24% increase in supported users. The Intel 5500 series processors implement Hyper-Threading, a form of Simultaneous Multi-Threading (SMT) which allows for two logical threads to run on each physical core. The two socket M710 used in this testing had 8 cores and 16 logical threads. It is not recommended to use CPU affinity for SAP VMs in production, but for this high stress testing scenario it was used to assign the vCPUs for each VM to specific logical threads.

For a final test, a second server with the same configuration was added.  All four VMs kept the same configuration from the previous test with 4vCPUs, but two of the VMs were moved to the new server.  This third configuration is similar to the first, with the number of virtual CPUs being equal to the number of physical cores.  There are 16 vCPUs assigned across the four VMs and there are a total of 16 cores across the two physical servers.   Spread across two hosts, a total of approximately 2630 users could be run at under 1 second response time representing a 90% increase in performance from the first configuration.

In terms of CPU utilization the servers were heavily utilized in all of the tests at the one second response time criteria.  In the first test the core utilization as reported by esxtop averaged 85% and in the second test with all 16 logical threads being utilized it increased to 93%.  In the final test with two hosts, the core utilization was 77% and 80%.

The results show that using SAP in a three-tier configuration on vSphere allows for excellent scaling.   Additional performance was achieved by simply adding more resources to the existing VMs (in the form of more physical cores) without making any changes to the operating system, database, or application servers running inside the VMs.

Achieving High Web Throughput with VMware vSphere 4 on Intel Xeon 5500 series (Nehalem) servers

We just published a SPECweb2005 benchmark score of 62,296 — the highest result published to date on a virtual configuration. This result was obtained on an HP ProLiant DL380 G6 server running VMware vSphere 4 and featuring Intel Xeon 5500 series processors, and Intel 82598EB 10 Gigabit AF network interface cards. While driving the network throughput from a single host to just under 30 Gbps, this benchmark score still stands at 85% of the level achieved in native (non-virtualized) execution on equivalent hardware configurations.

Our latest benchmark results show that VMware, with our partners Intel and HP, is able to provide virtualization solutions that meet the performance and scaling needs of modern data centers. In addition, the simplification achieved through consolidation in a virtual environment, as demonstrated by the configuration used in our benchmark publication, contributes to eliminating complexity in the software environment.

Let me briefly discuss some of the distinctive characteristics of our latest benchmark results:

Use of VMDirectPath for virtualizing network I/O: VMDirectPath is a feature in vSphere 4 that builds upon Intel VT-D (Virtualization Technology for Directed I/O) capability engineered into recent Intel processors to virtualize network I/O. It allows guest operating systems to directly access an I/O device, bypassing the virtualization layer. The result we just published is notably different from our previous results in that this time we used VMDirectPath feature to take benefit of the higher performance that it makes possible.

High performance and linear scaling with the addition of virtual machines: VMDirectPath bypasses the virtualization layer to a large extent for the network interactions but, a measurable number of guest OS and hypervisor interactions still remain. The possibility still exists that the hypervisor can become a scaling limiter in a multi-VM environment. The excellent performance achieved by our benchmark configuration using four virtual machines shows that this should not be a concern.

A highly simplified setup: Results published in the SPECweb2005 website reveal the complexity of “interrupt pinning” that is common in the configurations in a native setting, generally employed in order to make full use of all the cores in today’s multi-core processors. By comparison, our benchmark configuration does not use device interrupt pinning. This is because the virtualization approach divides the load among multiple VMs, each of which is smaller and therefore easier to keep core-efficient.

Virtualization Performance: Our results show that a single vSphere host can handle 30 Gbps real world Web traffic and still reach a performance level of 85% of the native results published on equivalent physical configuration. This demonstrates capabilities several orders of magnitude greater than those needed by typical Web applications, proof-positive that the vast majority of the Web applications can be consolidated, with excellent performance, in a virtualized environment.

For more details, check out the full length article published on the VMware community website in which we elaborate upon each of the characteristics that we briefly discussed here.