Home > Blogs > VMTN Blog > Category Archives: performance

Category Archives: performance

New community for vCenter Operations launched

Screen shot 2011-03-15 at 9.49.15 AM  Last night, vCenter Operations has gone GA, and at the same time we have launched a new community for it.

Come check out VMware vCenter Operations community to learn about the product, watch a video tutorial given by Kit Colbert, or ask a question in the community forum.

VMware vCenter Operations helps you to proactively ensure service levels, optimum resource usage and configuration compliance in dynamic virtual and cloud environments. Through automated operations management and patented analytics, you benefit from an integrated approach to performance, capacity and configuration management.

VMware sets web performance record with building blocks for multicores

Today's guest post is from VMware performance guru Scott Drummonds. Scott blogs over at communities (Virtual Performance blog) and the VMware Communities Performance section. Scott's talking about our latest feat of showing how virtual can be better than physical,  VMware Infrastructure Sets World Record for Web Server Performance

Building Block Architecture for Superior Performance

If any of you have heard me speak in the numerous events I've done in
the past two years, you may have heard me detail the areas where
virtualization performance can exceed native. There are scalability
limitations in traditional software that make nearly every enterprise
application fall short of utilizing the cores that are available to
them today. As the core explosion continues, this under-utilization of
processors will worsen. Here is a graph that we've been showing to
illustrate that point:

http://communities.vmware.com/servlet/JiveServlet/downloadImage/5369/core_explosion.png

In 2008 I visited VMworld Europe and showed on using multiple virtual
machines on a single physical host could circumvent the limitations in
today's software. In that experiment we showed that 16,000 Exchange mailboxes could be fit on a single physical server
when no one had ever put more than 8,000 on in a single native
instance. We called this approach designing by "building blocks" and
were confident that as the core count continued to increase, we'd
continue to expose more applications whose performance could be
improved through virtualization.

On Thursday last week SPEC accepted VMware's submission of a SPECweb2005 result. And last night we posted an article on VROOM!
detailing the experiment and providing information on the submission.
This submission is an incredible first for us: not only have we shown
that we can circumvent limitations in web servers, but we posted a
world record performance number in the process. Of course, if any of
you have seen Sreekanth Setty's presentation at VMworld on his ongoing
work on SPECweb2005, this result wouldn't surprise you:

http://communities.vmware.com/servlet/JiveServlet/downloadImage/5370/specweb_scaling.png

Getting a benchmark standardization body like SPEC to approve these
results isn't always easy. Most of the industry remains stuck in a mode
of thinking of performance as a single instance's maximum throughput.
But given the scale-out capabilities of a large number of enterprise
applications I'd argue that benchmarking should account for scale-out
capabilities on a single box. VMware's customers follow this practice
faithfully in sizing their deployments to match their needs and
everyone wants to know the platform's ability to handle this use-case.
SPEC's willingness to accept results showing building blocks on a
single host is commendable and progressive. As more benchmarks approve
submissions like these VMware will continue to be able to show record
numbers.

Virtualizing XenApp on XenServer 5.0 and ESX 3.5

New performance results are out that show ESX supports more XenApp users with lower CPU usage than XenServer. Click through for the whole thing, including the details of the user-centric workload. For perormance testing of desktop or terminal services virtualization, simulating the workload is very important. It can be tricky, and if you don't simulate how real humans use their desktops, you may not come to conclusions that can apply to real-world deployments.

VMware: VROOM!: Virtualizing XenApp on XenServer 5.0 and ESX 3.5.

There
has always been interest in running Citrix XenApp (formerly Citrix
Presentation Server) workloads on the VMware Virtual Infrastructure
platform. With the advent of multi-core systems, purchasing decisions
are driven towards systems with 4-16 cores. However, using this
hardware effectively is difficult due to limited scaling of the XenApp
application environment. In addition to the usual benefits of
virtualization, these scaling issues make running XenApp environments
on ESX even more compelling.

We
recently ran some performance tests to understand what can be expected
in terms of performance for a virtualized XenApp workload. The results
show that ESX runs common desktop applications on XenApp with
reasonable overhead compared to a native installation, and with
significantly better performance than XenServer. 


XenApp
and other products that virtualize applications are prime candidates to
be run in a VM. These results show that ESX can do so efficiently
compared to using a physical machine. This was shown with a benchmark
that: represents a real desktop workload, uses a metric that includes
latencies of all operations, and requires that all operations complete
successfully. Furthermore, ESX supports about 13% more users than
XenServer at a given latency while using less CPU.

IDF@Intel · Virtualization performance testing tips

More from this week at the Intel Developer Forum. Link: IDF@Intel · Virtualization performance testing tips.

Their list of "common sources of errors and anomolies" is worth a
paper of its own, as you can tell it comes from long experience, but
for this blog post let me just hit the headers of their slides on
"common pitfalls." After reading this, I hope you will think twice
before just firing up a quick timer on a process in a virtual machine.
It’s probably not telling you what you think it’s telling you! (Most
real-world virtualized workloads are not performance-bound, anyway, but
that’s a whole other conversation.)

  • Time drift
  • Unmonitored failures (loadsim, webbench)
  • Disk space
  • Spurious interrupts, network isolation
  • IOPS contention
  • Client instabilities (memory leaks, MTU, login failurers)
  • VMM "knobs"
  • Guest OS knobs — Ticklessness (avoiding context switches)
  • Application tuneables — Java heap and large page tuneables
  • Affinitization (shared caches on multicore machines can help or hurt)
  • Service packs and application versions
  • Client memory exhaustion
  • VMM memory fragmentation, oversubscription
  • BIOS or hardware "knobs"

Virtual web servers scale better than native

Link: Scaling real-life Web server workloads

Most real life web server workloads suffer from scalability issues such as those observed in our tests. In order to circumvent these issues, lots of businesses choose to deploy web server workloads on a multitude of one-CPU or dual-CPU machines. However, such approach leads to proliferation of servers in a data center environment resulting in higher costs in both power and space usage. Virtualization offers an easier alternative to avoid software scaling issues as well as provide efficiency in power and space usage. This is because, virtualization enables several complex operating environments that are not easily scalable to run concurrently on a single physical machine and exploit the vast compute resources offered by today’s power and space efficient multi-core systems. To quantify the effectiveness of this approach we measured SPECweb2005 performance by deploying multiple Apache/PHP configurations in a virtual environment. …

Thus, while the performance of each single-VCPU virtual machine is slightly lower than that of a one-CPU native machine (because of virtualization overhead), the cumulative performance of the multiple virtual machines well exceeds the performance of a large SMP native machine (because serialization penalties are reduced).

Five new performance papers from VMware

Performance of VMware VMI

VROOM! blog entry: VMI performance benefits; White paper: The Performance of VMware VMI. (Note that Krishna will be speaking at VMworld Europe 2008.)

Since VMI-enabled kernels can run on native systems, the popular
Linux distributions Ubuntu Feisty Fawn (7.04) and Ubuntu Gutsy Gibbon
(7.10) were shipped with VMI enabled by default in the kernel,
providing transparent performance benefits when they are run in ESX
Server 3.5. VMware is also working with Novell to include VMI in the SUSE Linux Enterprise Server distribution. …

The paper has details on the workloads that we ran, the benchmark
methodologies used, and the reasoning behind them. It will be clear
from the paper that VMware’s VMI-style paravirtualization offers
performance benefits for a wide variety of workloads in a totally
transparent way.

SPECweb2005 Performance on VMware ESX Server 3.5

VROOM! blog entry: SPECweb2005 Performance on VMware ESX Server 3.5; Performance study: SPECweb Performance.

Truth be told, with a number of superior features and performance
optimizations in VMware ESX Server 3.5, performance is no longer a
barrier to virtualization, even for the most I/O-intensive workloads.
In order to dispel the misconceptions these customers had, we decided
to showcase the performance of ESX Server by benchmarking with
industry-standard I/O-intensive benchmarks. We looked at the whole
spectrum of I/O-intensive workloads. My colleague has already addressed
database performance. Here, I’d like to focus on web server
performance; in particular, the performance of a single virtual machine
running the highly-network intensive SPECweb2005 benchmark.

SPECweb2005 is a SPEC benchmark for measuring a system’s ability to
act as a web server. It is designed with three workloads to
characterize different web usage patterns: Banking (emulates online
banking), E-commerce (emulates an E-commerce site), and Support
(emulates a vendor support site providing downloads). The three
benchmark components have vastly different workload characteristics and
we thus look at results from all three.

Performance Characterization of VMFS and RDM Using a SAN

White paper: Performance Characterization of VMFS and RDM Using a SAN

The test results described in this study show that VMFS and RDM provide similar I/O throughput for most of the workloads we tested. The small differences in I/O performance we observed were with the virtual machine running CPU‐saturated. The differences seen in these studies would therefore be minimized in real life workloads because most applications do not usually drive virtual machines to their full capacity. Most enterprise applications can, therefore, use either VMFS or RDM for configuring virtual disks when run in a virtual machine.
However, there are a few cases that require use of raw disks. Backup applications that use such inherent SAN features as snapshots or clustering applications (for both data and quorum disks) require raw disks. RDM is recommended for these cases. We recommend use of RDM for these cases not for performance reasons but because these applications require lower‐level disk control.

Large Page Performance

Performance study: Large Page Performance

The enhanced large page support in VMware ESX Server 3.5 and ESX Server 3i v3.5 enables 32‐bit virtual machines in PAE mode and 64‐bit virtual machines to make better use of large pages than they could when running on earlier versions of ESX Server. Our study, using SPECjbb2005, shows that using large pages can significantly improve the performance of this workload, compared to running the workload using small pages. The results demonstrate that if an application can benefit from large pages on a native machine, it can potentially achieve similar performance improvement in a virtual machine running on ESX Server 3.5 and ESX Server 3i v3.5.

What’s New in VMware Infrastructure 3: Performance Enhancements

White paper: What’s New in VMware Infrastructure 3: Performance Enhancements

Table of Contents

  • Scalability Enhancements
  • New Guest Operating System Support
  • Networking Enhancements
  • VMXNET Enhancements
  • TCP Segmentation Offload (TSO)
  • Jumbo Frames
  • 10 Gigabit Ethernet
  • NetQueue
  • Intel I/O Acceleration Technology Support (Experimental)
  • CPU Enhancements
  • Paravirtualized Linux Guests
  • Memory Enhancements
  • NUMA Improvements
  • Storage Enhancements
  • Infiniband Support
  • Summary

More on virtualization benchmarks and their meaning

Steve Wilson @ Sun talks about virtualization benchmarks:  Virtual Steve : VM Benchmarking Wars.

Despite the reports about 3x faster, what Oracle said was actually,
"Oracle consistently saw much better resource utilization with an
average of three times less overhead using Oracle VM."  In
benchmark-speak, that’s a lot different than 3x faster.  In fact, I’m
guessing that someone constructed a benchmark that shows Oracle VM
running at 99% of native speed, while VMware ran at 97% of native speed
– 1% overhead instead of 3%.  Yes, "three times less overhead", but
ultimately trivial.

Of course, the the long term, there will be
better official benchmarks of VM performance, but the real test will be
customer metrics like

  • How fast can I roll out a new application?
  • How many system admins do I need to run my data center?
  • How much energy can I save by operating my data center in a new way?

Nobody’s seen Oracle’s benchmark results yet, but while we’re waiting, VMware’s Richard McDougall also adds the following: Just
what are the important performance factors for Virtualization?

  • Throughput: can the application deliver the required levels of throughput, in terms of real world transactions?
  • Latency: is the latency of each transaction within tolerances, or affected by virtualization
  • Scalability: does throughput/latency change as load is increased
    (often asked in the context of – “do I have enough future headroom?”)
  • Efficiency: How much physical resource is used to deliver a prescribed throughput
  • Price-performance: Cost of the total system
  • Power-performance: Throughput relative to power, i.e. how much performance can I deliver for the power consumed
  • Space-performance: How much performance can I get per rack unit
  • Agility: What is the time taken to deploy a new application

Richard was the author of Ten Reasons Oracle Databases Run Best on VMware (although I don’t want to slight the many folks who worked on producing those results).

Virtualization performance is a topic that is complicated, and like all benchmarks and performance metrics, has to fit into the larger context of what problems you’re trying to solve. But the take-home message here is that people run Oracle and SQL Server and other databases every day in production on VMware Infrastructure, and performance is not an issue for them.

White paper: SQL Server Performance in VMware Infrastructure

Link: SQL Server Performance in a VMware Infrastructure 3 Environment

Sqlserver_perf_2
This paper describes transaction processing workload performance in virtual machines using Microsoft SQL Server 2005 and VMware Infrastructure 3. This performance study was conducted at the HP Strategic Alliances Engineering (SAE) lab in Cupertino. The primary goal is to prove that Microsoft SQL Server 2005 can successfully handle enterprise-level transaction-processing workloads when running inside VMware virtual machines. To facilitate planning for server consolidation, this study presents sizing data and data on system resource utilization at various load levels for uniprocessor (UP) virtual machines, two-way SMP virtual machines, and four-way SMP virtual machines. This study also compares the performance of UP, two-way, and four-way SMP virtual machines across 32-bit and 64-bit virtual environments.

This performance study clearly demonstrates that VMware Infrastructure 3 provides an excellent production-ready virtualization platform for customers looking to deploy Microsoft SQL Server inside virtual machines. Furthermore, together with virtualization-based distributed infrastructure services such as VMotion, VMware High Availability, and VMware Distributed Resource Scheduler, VMware Infrastructure 3 can provide increased serviceability, efficiency, and reliability for your SQL Server deployments. This should offer transformative cost savings to your dynamic data center. …

ESX Server also has impressive isolation characteristics when one virtual CPU gets stressed:

The transaction throughput shown in these results represents the highest steady-state throughput achievable for the virtual machine until its CPU resources are saturated. However, note that while the CPU resources inside the virtual machine were exhausted, the physical CPU resources used on the ESX Server host were only used corresponding to the number of virtual CPUs because of the excellent isolation capability offered by VMware ESX Server. In other words, for UP virtual machines, only 13 percent of total physical CPU resources were utilized. Likewise, for 2-VCPU virtual machine, total physical CPU utilization was about 29 percent, and for the 4-VCPU virtual machine, the maximum total physical CPU utilization was about 52 percent. This implies that configurations running multiple instances of SQL Server virtual machines can achieve higher transaction throughput rates than those running only a single instance. It is possible that the transaction throughput in a configuration with multiple instances may be limited by the IOPS that the storage can handle. In such a situation, you may want to configure storage adapter queue depths and ESX Server outstanding disk requests parameters to minimize any queued disk I/O requests.

If you’re interested in the topic of virtualizing databases (and who reading this blog isn’t, really? We are so much fun at dinner parties) be sure to check out Ten Reasons Why Oracle Databases Run Best on VMware.

Ten Reasons Why Oracle Databases Run Best on VMware

If you still think that databases are too resource intensive or otherwise not suitable for virtualization, or if you think that running enteprise apps like Oracle on VI3 immediately impose huge overheads, this new article from Richard McDougall on VMware’s performance team is a must read.

Link: Ten Reasons Why Oracle Databases Run Best on VMware – VMware VROOM!.

We’re really excited about the buzz around Oracle in virtualized
environments. One of the best kept secrets is just how well Oracle
performs on VMware ESX. This didn’t happen by accident – there are a
number of features and performance optimizations in the VMware ESX
server architecture, specifically for databases.

In this blog, I’ll walk through the top ten most important features
for getting the best database performance. Here are a few of the
performance highlights:

  • Near Native Performance: Oracle databases run at performance similar to that of a physical system
  • Extreme Database I/O Scalability: VMware ESX Server’s thin
    hypervisor layer can drive over 63,000 database I/Os per second (fifty
    times the requirement of a typical database)
  • Multi-core Scaling: Scale up using SMP virtual machines and multiple database instances
  • Large Memory : Scalable memory – 64GB per database, 256GB per host

We’ve continued to invest a great deal of work towards optimizing
Oracle performance on VMware, because it’s already one of the most
commonly virtualized applications. The imminent ESX 3.5 release is our
best database platform to date, with several new advanced optimizations.

In this blog article we’d like to explain the unique and demanding
nature of database applications such as Oracle produces and show the
performance capabilities of ESX Server on this type of workload.

New VMmark results page

Link: VROOM! VMmark 1.0 Results Page Debuts.

Please take a moment to check out the benchmark disclosures posted on
our newly unveiled VMmark results page at www.vmware.com/products/vmmark/results.html. At the moment, the page
contains results submitted by both Dell and HP using VMmark 1.0. This
data should help customers effectively evaluate virtualization
platforms in a consolidation scenario. There is a nice range of
results ranging from four-core to sixteen-core systems, including
Intel’s just-launched quad-core, quad-socket systems
(Tigerton/Caneland). We are delighted by the strong initial response
to the benchmark from our partners and expect the list of submitted
results to grow.