Home > Blogs > VMware VROOM! Blog > Monthly Archives: July 2010

Monthly Archives: July 2010

VMmark 2.0 Beta Overview

As I mentioned in my last blog, we have been developing VMmark 2.0, a next-generation multi-host virtualization benchmark that models not only application performance in a virtualized environment but also the effects of common virtual infrastructure operations. This is a natural progression from single-host virtualization benchmarks like VMmark 1.x and SPECvirt_sc2010. Benchmarks measuring single-host performance, while valuable, do not adequately capture the complexity inherent in modern virtualized datacenters. With that in
mind, we set out to construct a meaningfully stressful virtualization benchmark with the following properties:

  • Multi-host to model realistic datacenter deployments
  • Virtualization infrastructure workloads to more accurately capture overall platform performance
  • Heavier workloads than VMmark 1.x to reflect heavier customer usage patterns enabled by the increased capabilities of the virtualization and hardware layers
  • Multi-tier workloads driving both VM-to-VM and external network traffic
  • Workload burstiness to insure robust performance under variable high loads

The addition of virtual infrastructure operations to measure their impact on overall system performance in a typical multi-host environment is a key departure from
traditional single-server benchmarks. VMmark 2.0 includes the execution of the
following foundational and commonly-used infrastructure operations:

  • User-initiated vMotion 
  • Storage vMotion
  • VM cloning and deployment
  • DRS-initiated vMotion to accommodate host-level load variations

The VMmark 2.0 tile features a significantly heavier load profile than VMmark 1.x and consists of the following workloads:

  • DVD Store 2 – multi-tier OLTP workload consisting of a 4-vCPU database VM and three 2-vCPU webserver VMs driving a bursty load profile
  • OLIO – multi-tier social networking workload consisting of a 4-vCPU web server and a 2-vCPU database server
  • Exchange2007 – 4-vCPU mailserver workload
  • Standby server – 1 vCPU lightly-loaded server

We kicked off an initial partner-only beta program in late June and are actively polishing the benchmark for general release. We will be sharing a number of interesting experiments using VMmark 2.0 in our blog leading up to the general release of the benchmark, so stay tuned.

Microsoft Office SharePoint Server 2007 Performance on VMware vSphere 4.1

VMware recently released a whitepaper
showing the performance scalability of SharePoint Server 2007 on VMware
vSphere 4.1. This paper demonstrates that vSphere 4.1 exhibits high
performance and includes advanced features that can improve the overall
user experience in multi-tier applications such as SharePoint.

Results
of the experiments, in which up to 171,600 heavy SharePoint users were
supported on a single physical server,  highlight the benefits gained by
the ability to easily deploy additional SharePoint virtual machines as
needed to satisfy changing demands

Vincent-SharePoint-VROOM-Blog-fig01 

This
paper also discusses some of the advanced features of VMware vSphere
4.1—such as memory compression, NUMA-aware resource management, and
inter-VM communication—that allow vSphere to efficiently virtualize
resource-intensive and latency-sensitive applications. The paper
concludes with a set of recommended best-practices for achieving optimal
SharePoint performance on VMware vSphere 4.1. For more information on
this research, read the full paper: Microsoft Office SharePoint Server 2007 Performance on VMware vSphere 4.1.

Understanding Memory Resource Management in VMware ESX Server 4.1

We have published a whitepaper about how ESX server 4.1 manages the host memory resource. This paper not only presents the basic memory resource management concepts but also shows experiment results explaining the performance impact for four different memory reclamation techniques: Page sharing, ballooning, memory compression and host swapping used in ESX server 4.1. The experiment results show that:

1) Page sharing introduces negligible performance overhead;

2) Compared to host swapping, ballooning will cause much smaller performance degradation when reclaiming memory. In some cases, ballooning even brings zero performance overhead.

3) Memory compression can significantly reduce the amount of the swapped out pages and hence greatly improve the overall performance in high memory overcommitment scenario.

The following is the brief summary of the paper.

In this paper, we start from introducing the basic memory virtualization concepts. Then, we discuss the reason why supporting memory overcommitment is necessary in ESX server. Four memory reclamation techniques are currently used in ESX server: Transparent Page Sharing (TPS), Ballooning, Memory Compression and Host Swapping. We illustrate the mechanism of these techniques and analysis the Pros and Cons of each technique from performance perspective. In addition, we present how ESX memory scheduler uses a share-based allocation algorithm to allocate memory for multiple Virtual machines when host memory is overcommitted.

Beyond the technique discussion, we conduct experiments to help user understand how individual memory reclamation techniques impact the performance of various applications. In these experiments, we choose to use SPECjbb, Kernel Compile, Swingbench, SharePoint and Exchange benchmarks to evaluate different techniques.

Finally, based on the memory management concepts and performance evaluation results, we present some best practices for host and guest memory usage.

The paper can be found in

http://www.vmware.com/files/pdf/techpaper/vsp_41_perf_memory_mgmt.pdf

Note that this paper is written based on ESX4.0 memory management paper. Besides the new content introduced in ESX4.1, e.g., memory compression, quite a few places have been updated to represent the state of the art of ESX memory management.

Enhanced VMware ESX 4.1 CPU Scheduler

Check out our technical paper on ESX CPU scheduler in vSphere 4.1. This is revised from the previous version to reflect a new feature, wide VM NUMA support. 

This paper attempts to answer the following questions:

  • How is CPU time allocated between virtual machines? How well does it work?
  • What is the difference between “strict” and “relaxed” co-scheduling? What is the performance impact of recent co-scheduling improvements?
  • What is the “CPU scheduler cell”? What happened to the scheduler cell in ESX4?
  • How does ESX scheduler exploit the underlying CPU architecture features like multi-core, hyper-threading, and NUMA?

The following provides a brief summary of the paper:

ESX 4.1 introduces wide-VM NUMA support, which improves memory locality for memory-intensive workloads. Based on testing with micro benchmarks, the performance benefit can be up to 11–17 percent.

In ESX 4, many improvements have been introduced in the CPU scheduler. This includes further relaxed co-scheduling, lower lock contention, and multicore-aware load balancing. Co-scheduling overhead has been further reduced by the accurate measurement of the co-scheduling skew and by allowing more scheduling choices. Lower lock contention is achieved by replacing the scheduler cell lock with finer-grained locks. By eliminating the scheduler cell, a virtual machine can get higher aggregated cache capacity and memory bandwidth. Lastly, multicore-aware load balancing achieves high CPU utilization while minimizing the cost of migrations.

Experimental results show that the ESX 4 CPU scheduler faithfully allocates CPU resources as specified by users. While maintaining the benefit of a proportional-share algorithm, the improvements in co-scheduling and load-balancing algorithms are shown to benefit performance. Compared to ESX 3.5, ESX 4 significantly improves performance in both lightly-loaded and heavily-loaded systems.

The paper can be downloaded from http://www.vmware.com/resources/techresources/10131.


Performance Implications of Storage I/O Control in vSphere Environments with Shared Storage

vSphere based virtualized datacenters often employ a shared
storage infrastructure to support clusters of vSphere hosts. Applications
running in virtual machines (VM) on vSphere hosts share the storage resources
for their I/O needs. Performance of applications can be impacted when VMs contend
for storage resources that are shared. Without proper access control for sharing the resources,
the performance of all applications tend to get affected in a non-trivial way.
Storage I/O Control (SIOC), a new feature offered in VMware vSphere 4.1,
provides a dynamic control mechanism for proportional allocation of shared storage
resources to VMs running on multiple hosts. The experiments conducted in VMware
performance labs show that:

  • SIOC prioritizes VMs’ access to shared I/O
    resources based on disk shares assigned to them. 
  •  If the VMs do not fully utilize their portion of
    the allocated I/O resources on a shared datastore, SIOC redistributes the
    unutilized resources to those VMs that need them in proportion to the VMs’ disk
    shares.
  • SIOC minimizes the fluctuations in performance
    of a critical workload during periods of I/O congestion. 
    For the test case executed at VMware labs, limiting the fluctuations to a small range resulted in
    as much as a 26% performance benefit compared to that with the default configuration (figure 1).

Figure 1. Application throughput with and
without SIOC enabled

Sioc-adv
 

For further details, read the white paper titled “Managing
Performance Variance of Applications Using Storage I/O Control” at http://www.vmware.com/resources/techresources/10120