Home > Blogs > VMware VROOM! Blog > Monthly Archives: February 2009

Monthly Archives: February 2009

Comparing Hardware Virtualization Performance Utilizing VMmark v1.1

Virtualization has just begun to remake the datacenter. One only needs to look at the rapid pace of innovation to know that we are in the midst of a revolution. This is true not only for virtualization software, but also for the underlying hardware. A perfect example of this is new hardware support for virtualized page tables provided by both Intel’s Extended Page Tables (EPT) and AMD’s Rapid Virtualization Indexing (RVI). In general, these features reduce virtualization overhead and improve performance. A previous paper showed how RVI performs with data for a range of individual workloads. As a follow-on, we decided to measure the effects of RVI in a heterogeneous environment using VMmark, the tile-based mixed-workload consolidation benchmark from VMware®.

VMware ESX has the following three modes of operation: software virtualization (Binary Translation, abbreviated as BT), hardware support for CPU virtualization (abbreviated in AMD systems as AMD-V), and hardware support for both CPU and MMU virtualization utilizing AMD-V and RVI (abbreviated as AMD-V + RVI). For most workloads, VMware recommends that users let ESX automatically determine if a virtual machine should use hardware support, but it can also be valuable to determine the optimal settings as a sanity check.

Environment Configuration:

System under Test

Dell PowerEdge 2970

CPU

2 x Quad-Core AMD Opteron 8384 (2.5GHz)

Memory

64GB DDR2 Reg ECC

Hypervisor

VMware ESX (build 127430)

Application

VMmark v1.1

Virtual Hardware (per tile)

10 vCPUs, 5GB memory, 62GB disk

 ·         AMD RVI works in conjunction with AMD-V technology, which is a set of hardware extensions to the x86 system architecture designed to improve efficiency and reduce the performance overhead of software-based virtualization solutions.  For more information on AMD virtualization technologies see here. 

·         VMmark is a benchmark intended to measure the performance of virtualization environments in an effort to allow customers to compare platforms.  It is also useful in studying the effect of architectural features. VMmark consists of six workloads (Web, File, Database, Java, Mail and Standby servers). Multiple sets of workloads (tiles) can be added to scale the benchmark load to match the underlying hardware resources. For more information on VMmark see here.

Test Methodology

By default, ESX automatically runs 32bit VMs (Mail, File, and Standby) with BT, and runs 64bit VMS (Database, Web, and Java) with AMD-V + RVI.  For these tests, we first ran the benchmark using the default configuration and determined the number of tiles it would take to saturate the CPU resources.  All subsequent benchmark tests used this same load level. We next measured the baseline benchmark score with all VMs under test except Standby configured to use BT (i.e., no hardware virtualization features). A series of benchmark tests was then executed while varying the hardware virtualization settings for different workloads to assess their effects in a heavily-utilized mixed-workload environment. All of the results presented are relative to the baseline score and illustrate the percentage performance gains achieved over the BT-only configuration.

We began by setting the Standby servers to use both AMD-V + RVI.  We then stepped through each of the available workloads and altered the CPU/MMU hardware virtualization settings for that specific workload type.  After determining which setting was best (BT, AMD-V, or AMD-V + RVI) we used that setting for the subsequent tests.

Results


The test results summarized in Table 1 are both interesting and insightful. ESX’s efficient utilization of AMD-V + RVI for each workload highlights a leap forward in virtualization platform performance. Remember that once we determined AMD-V + RVI to be the best for a workload, we continued to use that setting for that workload during all subsequent tests unless otherwise noted. For example in the AMD-V File run below, the Web server VMs were set to AMD-V + RVI, File server VMs were set to use just AMD-V, and all other non-Standby servers were set to BT.

Vroom-RVI-2   Click on graph to enlarge

By taking advantage of hardware-assist features in the processor, ESX is able to achieve significant performance gains over using software-only virtualization. The default or “out of the box” settings produced good results, and further tuning for this particular set of workloads yielded additional performance gains of nearly 6% for our SUT. 

It should be noted that these performance gains may or may not be true for dissimilar workload, but for this configuration the improvement made by utilizing an all AMD-V and RVI enabled environment was very impressive. In addition, older processor versions with different cache sizes, clock rates, etc. may produce different results.

It’s probably safe to say that hardware technologies seem to be trending to continued improvements for virtualized environments.  ESX’s ability to provide proficient deployment of the latest and greatest hardware innovation, combined with its flexibility in allowing users to run different workloads with different levels of hardware assist is what truly sets it apart.    

All information in this post regarding future directions and intent are subject to change or withdrawal without notice and should not be relied on in making a purchasing decision of VMware's products. The information in this post is not a legal obligation for VMware to deliver any material, code, or functionality. The release and timing of VMware's products remains at VMware's sole discretion.

VMware Sets Performance Record with SPECweb2005 Result

Introduction

We just published the largest SPECweb2005 score to date on a 16 core server. The benchmark was run on an HP ProLiant DL585 G5 with four Quad-Core AMD 8382 OpteronTM processors.  This record score of 44,000 includes an Ecommerce component demonstrating 69,525 concurrent connections.  In the Support component, this single-host workload drove network throughput on the server to just under 16 Gb/s.

This once again proves the capabilities of the VI3 platform and its ability to service workloads with stringent Quality of Service (QoS) requirements along with a large storage and networking footprint. With continuous advancements in virtualization technology (such as hardware assist for MMU virtualization and NetQueue support for 10 Gigabit Ethernet) performance with the VI3 platform can meet the needs of the most demanding, high traffic web sites.

While record-setting performance of web servers proves the capabilities of ESX, the real story of web server virtualization is the gains due to web farm consolidation and improved flexibility. Infrastructure serving as the web front end today is designed around hundreds or even thousands of often underutilized two and four core servers.  Consolidation of these servers onto modern systems with multi-core CPUs reduces costs, simplifies management and eases power and cooling demands. Consolidating web servers makes business sense. This SPECweb2005 result from VMware has shown that the ESX Server can handle loads much more extreme than anticipated in such a consolidated environment.

The Benchmark

The SPECweb2005 benchmark consists of three workloads: Banking, Ecommerce, and Support, each with different workload characteristics representing common use cases for web servers. Each workload measures the number of simultaneous user sessions a web server can support while still meeting stringent quality-of-service and error-rate requirements. The aggregate metric reported by the SPECweb2005 benchmark is a normalized metric based on the performance scores obtained on all three workloads.

Component

Score

Explanation

Banking

80,000

Models online banking.  Represents number of customers accessing accounts at a given time that can be supported with acceptable QoS

E-commerce

69,525

Models online retail store. 69,525 is only 75 shy of the highest number reported, which required 50% more processing cores.

Support

33,000

Represents users acquiring patches and downloads from a support web site. In this test network throughput was 16Gb/s.

SPECweb2005 Score

44,000

Normalized metric from the three components.

Table 1.Results in SPECweb2005 submission.

Benchmark Configuration  

Hardware

HP ProLiant DL585 G5 with four Quad-Core AMD 8382 OpteronTM processors, 128 GB RAM.

Disk subsystem

Two EMC CLARiiON CX3-40 Fibre Channel SAN arrays, total of 110 * 133GB (15K RPM) spindles 

Network

Four Intel 10 Gigabit XF SR Server Adapters

Hypervisor

ESX Server 3.5 U3

Guest Operating system

RedHat Enterprise Linux 5 Update 1

Virtual hardware

1 vCPU, 8 GB memory, vmxnet virtual network adapter

Web Server Software

Rock Web Server v1.4.7, Rock JSP/Servlet Container v1.3.2

Client Systems

30 * Dell Poweredge 1950, Dual-socket Dual-core Intel Xeon, 8 GB

Workload

SPECweb2005

Table 2.Benchmark Configuration.

Performance Details
Here’s a quick look at what was accomplished with a single ESX host using SPECweb2005 workload.

Aggregate performance: The aggregate SPECweb2005 performance of 44,000 obtained on our 16-core virtual configuration is higher than any result ever recorded on a 16-core native system. 

Support performance: The support workload is the most I/O intensive of all the workloads. The file-set data used for the support workload was laid out on a little over 100 spindles and consisted of files varying in size ranging from 100 KB to 36 MB. In our test configuration, we used fifteen virtual machines that shared the underlying physical 10Gbps NICs. Together they supported over 33,000 concurrent support user sessions, and handled close to sixteen Gigabits per second web traffic on a single ESX host. 

Banking performance: This workload emulates online banking that transfers encrypted information using HTTPS. The file-set data used for the banking test was about 1.3 terabytes consisting of some eight million individual files of varied sizes. We laid out all this data in a single VMFS volume that spanned multiple LUNs. We used fifteen virtual machines that shared the same base image. Together, they supported 80,000 concurrent banking user sessions and handled 143,000 HTTP operations/second. 

Ecommerce performance: Of the three workloads, Ecommerce workload probably fits the profile of most customers. This is because unlike Banking, and Support workloads, this workload is a mixture of HTTP and HTTPS requests. The I/O characteristics fall in between Banking and Support workloads. In our test, ESX supported 69,525 concurrent Ecommerce user sessions on a 16-core server. Our result is the second highest E-commerce result ever published, which has only been bested by only another 75 sessions on a system with 50% more cores.

To learn more about the test configuration and tuning descriptions, please see the full disclosure report on the official SPEC website: http://www.spec.org/osg/web2005.