Home > Blogs > VMware VROOM! Blog > Monthly Archives: May 2009

Monthly Archives: May 2009

Java Performance on vSphere 4

VMware ESX is an excellent platform for deploying Java
applications.  Many customers use it to
support Java applications from the desktop to business-critical enterprise
servers.  However, we haven't published
any results recently highlighting the excellent performance of Java
applications on VMware ESX.  As a first
step at remedying this situation, we compared native and virtualized
performance using SPECjvm2008.  This
workload is a benchmark suite containing several real life applications and
benchmarks focusing on core java functionality. The results demonstrate that
Java applications run on VMware vSphere at greater than 94% of native performance
over a range of VM sizes.  This is up to
a 9% improvement over VMware ESX 3.5, which already runs this workload at close
to or better than 90% of native performance.

We ran SPECjvm2008 on Red Hat Enterprise Server 5 Update 3 using
the latest JVM from Sun Microsystems, JRE 1.6 Update 13.  Tests were conducted with both 32-bit and
64-bit  versions of the OS and JVM.  An HP DL380G5 equipped with two quad-core Intel
Xeon X5460 (Harpertown) processors running at 3.16GHz was used.  This server had 32GB of memory.  For native runs using less than the full
number of available CPU cores, we used the kernel boot parameter maxcpus= to limit the OS to a given
number of cores.  We also used the kernel
boot parameter mem= to limit the
memory to 16GB in all 64-bit runs.  The
runs on VMware vSphere 4.0 and VMware ESX 3.5 Update 4 were done in virtual machines
(VMs) using the stated number of virtual CPU s and 16GB of memory. 

The runs of SPECjvm2008 were all base runs, meaning that no
Java tuning parameters were used.   All
SPECjvm2008 results are required to include a base run.  Unfortunately, the default heap size of the
Sun JVM in the 1 CPU case is not large enough to run the SPECjvm2008
workload.  As a result, we were not able
to generate 1 CPU results which would be compliant with the run-rules for
SPECjvm2008.  We did generate native and vSphere
4.0 results for 2, 4, and 8 CPUs, and ESX 3.5 results for 2 and 4 CPUs.

Figure 1 shows the SPECjvm2008 results for the native,
VMware vSphere 4.0, and VMware ESX 3.5 cases. 
Figure 2 presents the same results normalized to the native result for
that server and CPU count.  These results
show that VMs running on VMware vSphere 4.0 perform at greater than 95% of
native on this benchmark at all VM sizes. 
Even with 8 vCPUs running on a server with only 8 physical cores, the
vSphere 4.0 VM achieves 99% of native performance.   The
VMware ESX 3.5 VMs ran at close to or greater than 90% of native, which is
still excellent for a virtualized environment. 
However, for 64-bit VMs, vSphere 4.0 gives a performance improvement over
ESX 3.5U4 of 9% in the 4 vCPU case, and about 3% in the 2 vCPU case.

Figure 1 SPECjvm2008 on 8-Core Intel
Harpertown Server

SPECjvm2008_blog_fig1

Figure 2 SPECjvm2008 performance relative to native

SPECjvm2008_blog_fig2

In order to sanity-check the
native results, we compared the 8-Core Harpertown result using the 64-bit OS
and JVM to the closest published result. 
There is no directly comparable result, but there is a result generated
by Sun on a 16-Core Intel Tigerton Server. 
The Tigerton is architecturally similar to the Harpertown, but the
Harpertown has a larger L2 cache.  The
Sun 16-core Tigerton result, using Solaris 10, a special performance build of
the Sun JVM (1.6.0_06p), and 64GB of memory, achieved 260 SPECjvm2008
ops/m.   Our native result on the 8-core Harpertown  with 16GB of memory was  145 SPECjvm2008 ops/m.   A
native run on the Harpertown with 32GB and using the Sun 1.6.0_06p JVM achieved
174 SPECjvm2008 ops/m.  This is well more
than half of the Tigerton result, and indicates that our native configuration
is producing reasonable results.

Figure 3 shows the scaling of
the results as we move from 2 to 4 and 8 CPUs for the 64-bit case.  The scaling is essentially the same for
32-bit.  The results are normalized to
the 2 CPU results on the same platform. 
These results show that VMware vSphere 4.0 scales as well as or better
than native for this workload.  VMware
ESX 3.5 scaling is just slightly below native.

Figure 3 SPECjvm2008 Scaling from 2
CPUs

 

SPECjvm2008_blog_fig3

The SPECjvm2008 results presented here show that core Java
functionality runs extremely well on VMware vSphere 4.0 and VMware ESX
3.5.  No special tuning was required to
get results that are remarkably close to native performance.  We hope to soon produce additional results to
demonstrate that this excellent performance extends to multi-tier Java Enterprise
Edition applications as well.  For comments or questions, please join us in the VMware Performance Community at this thread
: http://communities.vmware.com/message/1262696

VMware vCenter Update Manager Sizing Estimator Posted

VMware vCenter Update Manager is a component of VMware Infrastructure that automates patches and upgrades of ESX hosts, virtual machine Tools and hardware, Windows and Linux virtual machines, and virtual appliance. A new sizing tool, VMware vCenter Update Manager Sizing Estimator, is now available.

 

The following input parameters are used to estimate database size, patch store disk space, and temporary disk space:

-       Feasibility for virtual machine remediation

-       Number of ESX and ESXi flavors in the deployment

-       Number of hosts, virtual machines, Windows distributions, average number of locales for Windows distribution, average number of different Service Pack levels for Windows distribution,

-       Patch scan frequency for virtual machines

-       VMware Tools upgrade scan frequency for virtual machines

-       Virtual machine hardware upgrade scan frequency

-       Patch scan frequency for hosts

-       Upgrade scan frequency for hosts

 

The following are the outputs from the tool:

-       VMware vCenter Update Manager 4.0 database deployment model recommendations

-       VMware vCenter Update Manager 4.0 server deployment model recommendations

-       Initial disk space utilization in MB for database, patch store, and temporary space

-       Monthly disk space utilization growth in MB for database and patch store

-       The upper and lower bounds on the estimation, assuming a 20% variance

 

 

 

 

VMware vCenter Update Manager Performance and Best Practices White Paper Posted

VMware vCenter Update Manager is a component of VMware Infrastructure that automates patches and upgrades of ESX hosts, virtual machine Tools and hardware, Windows and Linux virtual machines, and virtual appliance. A new white paper, VMware vCenter Update Manager Performance and Best Practices, is now available.

In this paper we discuss VMware vCenter Update Manager 4.0 host deployment, latency, resource consumption, guest OS tuning, high-latency networks, and the impact of on-access virus scanning. We also provide performance tips to help customers tune the system for better performance.

Exchange 2007 performance on vSphere 4

VMware recently released a whitepaper showing the performance scalability of Exchange 2007 on VMware vSphere. This paper shows that vSphere 4.0 achieves excellent performance and scalability both with regards to scale up (adding more vCPUs) and scale out (adding more VMs).  The results indicate that vSphere can easily support 4,000 heavy Exchange users with a single 8 vCPU VM or 8,000 heavy Exchange users with multiples of either 2 or 4 vCPU VMs. While supporting these high user counts, the latencies of most of our virtualized Exchange configurations are half the recommended threshold (500 ms) with little overhead compared to physical.

 

Even the largest configuration, which supports 8,000 Heavy users with 16 vCPUs on an 8-way server, provides outstanding user experience. For our 8,000 heavy user mailbox configuration, the 95th Percentile Send Mail latency Is 273 ms with eight 2 vCPU VMs and 304 ms with four 4 vCPU VMs.

 

95th Percentile Send Mail Latency (2 vCPU VM vs. 4 vCPU VM)

 

  

VMs-Latency

 

 

In addition to these low latencies, this paper also shows that the 8,000 mailbox configuration consumes less than 60% of host CPU resources, which leaves room for further user growth and further consolidation. In addition, the paper shows that ESX provides consistent performance across all consolidated virtual machines. For example, the response times of the Exchange transactions in the eight 2 vCPU configuration were within 2% of each other. For more information on this research, read the full paper: Microsoft Exchange Server 2007 Performance on VMware vSphere.

350,000 I/O operations per Second, One vSphere Host

Summary

VMware vSphere includes a number of enhancements that enables
it to deliver very high I/O performance. In this study, we demonstrate that
vSphere can easily support even an extreme demand for I/O throughput made
possible by new products like Enterprise Flash Drives (EFD) offered by EMC. In the
experiments conducted at EMC labs, we were able to achieve just above 350,000
I/O operations per second with

  • Single vSphere host with just three virtual
    machines running on it
  • Latencies under 2ms
  • I/O block size of 8KB

What does such a high throughput mean to customers? Consider
this: the entire database of Wikipedia is supported by 20 MySQL servers each 200GB to
300GB in size. On an average Wikipedia receives 50,000 http requests or 80,000
SQL queries per second1, which
translates to 4.3 billion hits per day. With the storage infrastructure used in
our experiments we could easily accommodate the entire database of Wikipedia
and still be left with enough space. A single vSphere host driving more than
350,000 I/O requests per second could easily support the
throughput requirements of Wikipedia.

Background

In late May 2008, we published a blog article on
achieving 100K I/O operations per second with ESX 3.5. To achieve that, we had used 495 15K RPM Fibre
Channel disks spread across three CX3-80 arrays. If we were to push the
envelope further with vSphere, we needed more storage bandwidth. I
t would have taken approximately 1750 15K rpm Fibre Channel drives with 120
Disk Array Enclosures to provide the 350,000 I/O operations per second throughput
. If we
were to have some redundancy in the storage then the numbers would increase
further and go as high as 3500 drives for a RAID 1/0 configuration doubling the
entire SAN infrastructure.

Instead only 30 EFDs housed in three CX4-960 arrays provided enough
storage bandwidth for vSphere to drive just above 350,000 I/O requests per
second.


I/O workload

We could have achieved higher I/O operations per second with a smaller
block size, but we focused our studies on 8KB block because it is the
most  representative of real applications. We chose an I/O pattern that was 100% random in nature.

Key Findings

  • 3 VMs on
    one vSphere host supported 350,000 I/O operations per second with 8KB block
    size (Figure. 1)
  • A single VM with 2 vCPU and 4GB memory provided just under 120,000 I/O
    operations per second with 8KB block
    size
  • I/O
    latency as measured in ESX was just under 2 ms
  • VMware’s
    new paravirtualized SCSI adapter (pvSCSI) offered 12% improvement in throughput at 18% less CPU cost compared to LSI virtual adapter

350k
Figure.1 Scaling I/O
performance through vSphere


We are documenting all the experiments in detail in a white paper that will be posted on the VMware website. We encourage readers to refer to that white paper for more details.

This testing was the result of a joint effort between VMware and EMC. We would like to thank the Midrange Partner Solutions Engineering team at EMC,
Santa Clara for providing access to the hardware, for the use of their lab, and
for their joint collaboration throughout this project.

For more comments or questions, please join us in the VMware Performance Community website.

About the Authors:
Chethan Kumar is a member of Performance Engineering team at VMware. Radhakrishnan Manga is a member of Midrange Partner Solutions Engineering team at EMC.