Home > Blogs > VMware VROOM! Blog > Tag Archives: VMmark

Tag Archives: VMmark

First VMmark 3.1 Publications, Featuring New Cascade Lake Processors

VMmark is a free tool used by hardware vendors and others to measure the performance, scalability, and power consumption of virtualization platforms.  If you’re unfamiliar with VMmark 3.x, each tile is a grouping of 19 virtual machines (VMs) simultaneously running diverse workloads commonly found in today’s data centers, including a scalable Web simulation, an E-commerce simulation (with backend database VMs), and standby/idle VMs.

As Joshua mentioned in a recent blog post, we released VMmark 3.1 in February, adding support for persistent memory, improving workload scalability, and better reflecting secure customer environments by increasing side-channel vulnerability mitigation requirements.

I’m happy to announce that today we published the first VMmark 3.1 results.  These results were obtained on systems meeting our industry-leading side-channel-aware mitigation requirements, thus continuing the benchmark’s ability to provide an indication of real-world performance.

Some mitigations for recently-discovered side-channel vulnerabilities (i.e., Spectre, Meltdown, and L1TF) incur significant performance impacts, while others have little or no impact.  Today’s VMmark results demonstrate that even when additional mitigations are in place, ESXi hosts using the new 2nd-Generation Intel® Xeon® Scalable processors obtain higher VMmark scores than comparable 1st-Generation Intel Xeon Scalable processors.  This is due to processor design improvements that reduce (or even negate) the performance impact of security mitigations, by mitigating some of the security vulnerabilities in hardware rather than in software.

These results, from Fujitsu, span all three VMmark publication categories:

  1. Performance Only (9.02 @ 9 tiles)
  2. Performance with Server Power (6.3290 @ 9 tiles)
  3. Performance with Server and Storage Power (3.5013 @ 9 tiles)

So, how does this new performance result with Cascade Lake processors compare to the previous generation with Skylake processors?  Hopefully a graph is worth a thousand words 😊…

Fujitsu Skylake to Cascade Lake Graph

As you can see, Fujitsu was able to achieve a higher score, while being able to run an additional tile (19 more VMs) and still meeting strict Quality-of-Service (QoS) compliance requirements imposed by the VMmark benchmark harness.

Industry-Leading Side-Channel Mitigation Requirements
Given the numerous security vulnerabilities recently identified, we set a high bar in VMmark 3.1 that requires all applicable security mitigations in benchmarked environments to best represent secure, real-world customer environments.

These are the current security mitigation requirements for VMmark 3.1:

VMmark 3.1 Security Mitigations Table

VMmark 3.1 Security Mitigations Table

Note: If “N/A” is listed, that vulnerability does not apply to that portion of the stack.

For more information about VMmark, please visit the VMmark product page.

If you have any questions or feedback, please leave us a comment below.  Thanks!

VMmark 3.1 Released

It is my great pleasure to announce that VMmark 3.1 is generally available as of February 7, 2019!

What’s New?

This release adds support for persistent memory, improves workload scalability, and better reflects secure customer environments by increasing side-channel vulnerability mitigation requirements.

Visit our main VMmark HTML page for more information.

Please note that VMmark 3.0 will end of life on March 15th, 2019.

To learn more about VMmark3 see the introductory blog article here

Addressing Meltdown/Spectre in VMmark

The recently described Meltdown/Spectre vulnerabilities have implications throughout the tech industry, and the VMmark virtualization benchmark is no exception. In deciding how to approach the issue, the VMmark team’s goal was to address the impact of the these vulnerabilities while maintaining the value and integrity of the benchmark.

Applying the full set of currently available Meltdown/Spectre mitigations is likely to have a significant impact on VMmark scores. Because the mitigations are expected to continue evolving for some time, that impact might even change. If the VMmark team were to require the full set of mitigations in order for a submission to be compliant, that might make new submissions non-competitive with older ones, and also introduce more “noise” into VMmark scores as the mitigations evolve. While our intention for the future is that eventually all new VMmark results will be obtained on virtualization platforms that have the full set of Meltdown/Spectre mitigations, we have chosen to take a gradual approach.

Beginning May 8, 2018, all newly-published VMmark results must comply with a number of new requirements related to the Meltdown and Spectre vulnerabilities. These requirements are detailed in Appendix C of the latest edition of the VMmark User’s Guide.

Before performing any VMmark benchmark runs intended for publication, check the VMmark download page to make sure you’re using the latest edition of the VMmark User’s Guide.  If you have questions, you can reach the VMmark team at vmmark-info@vmware.com.

Introducing VMmark3: A highly flexible and easily deployed benchmark for vSphere environments

VMmark 3.0, VMware’s multi-host virtualization benchmark is generally available here.  VMmark3 is a free cluster-level benchmark that measures the performance, scalability, and power of virtualization platforms.

VMmark3 leverages much of previous VMmark generations’ technologies and design.  It continues to utilize a unique tile-based heterogeneous workload application design. It also deploys the platform-level workloads found in VMmark2 such as vMotion, Storage vMotion, and Clone & Deploy.  In addition to incorporating new and updated application workloads and infrastructure operations, VMmark3 also introduces a new fully automated provisioning service that greatly reduces deployment complexity and time.

Continue reading

Measuring Cloud Scalability Using the Weathervane Benchmark

Cloud-based deployments continue to be a hot topic in many of today’s corporations.  Often the discussion revolves around workload portability, ease of migration, and service pricing differences.  In an effort to bring performance into the discussion we decided to leverage VMware’s new benchmark, Weathervane.  As a follow-on to Harold Rosenberg’s introductory Weathervane post we decided to showcase some of the flexibility and scalability of our new large-scale benchmark.  Previously, Harold presented some initial scalability data running on three local vSphere 6 hosts.  For this article, we decided to extend this further by demonstrating Weathervane’s ability to run within a non-VMware cloud environment and scaling up the number of app servers.

Weathervane is a new web-application benchmark architected to simulate modern-day web applications.  It consists of a benchmark application and a workload driver.  Combined, they simulate the behavior of everyday users attending a real-time auction.  For more details on Weathervane I encourage you to review the introductory post.

Environment Configuration:
Cloud Environment: Amazon AWS, US West.
Instance Types: M3.XLarge, M3.Large, C3.Large.
Instance Notes: Database instances utilized an additional 300GB io1 tier data disk.
Instance Operating System: Centos 6.5 x64.
Application: Weathervane Internal Build 084.

Testing Methodology:
All instances were run within the same cloud environment to reduce network-induced latencies.  We started with a base configuration consisting of eight instances.  We then  scaled out the number of workload drivers and application servers in an effort to identify how a cloud environment scaled as application workload needs increased.  We used Weathervane’s FindMax functionality which runs a series of tests to determine the maximum number of users the configuration can sustain while still meeting QoS requirements.  It should be noted that the early experimentation allowed us to identify the maximum needs for the other services beyond the workload drivers and application servers to reduce the likelihood of bottlenecks in these services.  Below is a block diagram of the configurations used for the scaled-out Weathervane deployment.

Fig1

Results:
For our analysis of Weathervane cloud scaling we ran multiple iterations for each scale load level and selected the average.  We automated the process to ensure consistency.  Our results show both the number of users sustained as well as the http requests per second as reported by the benchmark harness.

Fig2

As you can see in the above graph, for our cloud environment running Weathervane, scaling the number of applications servers yielded nearly linear scaling up to five application servers. The delta in scaling between the number of users and the http requests per second sustained was less than 1%.  Due to time constraints we were unable to test beyond five application servers but we expect that the scaling would have continued upwards well beyond the load levels presented.

Although just a small sample of what Weathervane and cloud environments can scale to, this brief article highlights both the benchmark and cloud environment scaling.  Though Weathervane hasn’t been released publicly yet, it’s easy to see how this type of controlled, scalable benchmark will assist in performance evaluations of a diverse set of environments.  Look for more Weathervane based cloud performance analysis in the future.

 

Virtualized Storage Performance: RAID Groups versus Storage pools

RAID, a redundant array of independent disks, has traditionally been the foundation of enterprise storage. Grouping multiple disks into one logical unit can vastly increase the availability and performance of storage by protecting against disk failure, allowing greater I/O parallelism, and pooling capacity. Storage pools similarly increase the capacity and performance of storage, but are easier to configure and manage than RAID groups.

RAID groups have traditionally been regarded as offering better and more predictable performance than storage pools. Although both technologies were developed for magnetic hard disk drives (HDDs), solid-state drives (SSDs), which use flash memory, have become prevalent. Virtualized environments are also common and tend to create highly randomized I/O given the fact that multiple workloads are run simultaneously.

We set out to see how the performance of RAID group and storage pool provisioning methods compare in today’s virtualized environments.

First, let’s take a closer look at each storage provisioning type.

RAID Groups

A RAID group unifies a number of disks into one logical unit and distributes data across multiple drives. RAID groups can be configured with a particular protection level depending on the performance, capacity, and redundancy needs of the environment. LUNs are then allocated from the RAID group. RAID groups typically contain only identical drives, and the maximum number of disks in a RAID group varies by system model but is generally below fifty. Because drives typically have well defined performance characteristics, the overall RAID group performance can be calculated as the performance of all drives in the group minus the RAID overhead. To provide consistent performance, workloads with different I/O profiles (e.g., sequential vs. random I/O) or different performance needs should be physically isolated in different RAID groups so they do not share disks.

Storage Pools

Storage pools, or simply ‘pools’, are very similar to RAID groups in some ways. Implementation varies by vendor, but generally pools are made up of one or more private RAID groups, which are not visible to the user, or they are composed of user-configured RAID groups which are added manually to the pool. LUNs are then allocated from the pool. Storage pools can contain up to hundreds of drives, often all the drives in an array. As business needs grow, storage pools can be easily scaled up by adding drives or RAID groups and expanding LUN capacity. Storage pools can contain multiple types and sizes of drives and can spread workloads over more drives for a greater degree of parallelism.

Storage pools are usually required for array features like automated storage tiering, where faster SSDs can serve as a data cache among a larger group of HDDs, as well as other array-level data services like compression, deduplication, and thin provisioning. Because of their larger maximum size, storage pools, unlike RAID groups, can take advantage of vSphere 6 maximum LUN sizes of 64TB.

We used two benchmarks to compare the performance of RAID groups and storage pools: VMmark, which is a virtualization platform benchmark, and I/O Analyzer with Iometer, which is a storage microbenchmark.  VMmark is a multi-host virtualization benchmark that uses diverse application workloads as well as common platform level workloads to model the demands of the datacenter. VMs running a complete set of the application workloads are grouped into units of load called tiles. For more details, see the VMmark 2.5 overview. Iometer places high levels of load on the disk, but does not stress any other system resources. Together, these benchmarks give us both a ‘real-world’ and a more focused perspective on storage performance.

VMmark Testing

Array Configuration

Testing was conducted on an EMC VNX5800 block storage SAN with Fibre Channel. This was one of the many storage solutions which offered both RAID group and storage pool technologies. Disks were 200GB single-level cell (SLC) SSDs. Storage configuration followed array best practices, including balancing LUNs across Storage Processors and ensuring that RAID groups and LUNs did not span the array bus. One way to optimize SSD performance is to leave up to 50% of the SSD capacity unutilized, also known as overprovisioning. To follow this best practice, 50% of the RAID group or storage pool was not allocated to any LUN. Since overprovisioning SSDs can be an expensive proposition, we also tested the same configuration with 100% of the storage pool or RAID group allocated.

RAID Group Configuration

Four RAID 5 groups were used, each composed of 15 SSDs. RAID 5 was selected for its suitability for general purpose workloads. RAID 5 provides tolerance against a single disk failure. For best performance and capacity, RAID 5 groups should be sized to multiples of five or nine drives, so this group maintains a multiple of the preferred five-drive count. One LUN was created in each of the four RAID groups. The LUN was sized to either 50% of the RAID group (Best Practices) or 100% (Fully Allocated). For testing, the capacity of each LUN was fully utilized by VMmark virtual machines and randomized data.

RAID Group Configuration VMmark Storage Comparison        VMmark Storage Pool Configuration Storage Comparison

Storage Pool Configuration

A single RAID 5 Storage Pool containing all 60 SSDs was used. Four thick LUNs were allocated from the pool, meaning that all of the storage space was reserved on the volume. LUNs were equivalent in size and consumed a total of either 50% (Best Practices) or 100% (Fully Allocated) of the pool capacity.

Storage Layout

Most of the VMmark storage load was created by two types of virtual machines: database (DVD Store) and mail server (Microsoft Exchange). These virtual machines were isolated on two different LUNs. The remaining virtual machines were spread across the remaining two LUNs. That is, in the RAID group case, storage-heavy workloads were physically isolated in different RAID groups, but in the storage pool case, all workloads shared the same pool.

Systems Under Test: Two Dell PowerEdge R720 servers
Configuration Per Server:  
     Virtualization Platform: VMware vSphere 6.0. VMs used virtual hardware version 11 and current VMware Tools.
     CPUs: Two 12-core Intel® Xeon® E5-2697 v2 @ 2.7 GHz, Turbo Boost Enabled, up to 3.5 GHz, Hyper-Threading enabled.
     Memory: 256GB ECC DDR3 @ 1866MHz
     Host Bus Adapter: QLogic ISP2532 DualPort 8Gb Fibre Channel to PCI Express
     Network Controller: One Intel 82599EB dual-port 10 Gigabit PCIe Adapter, one Intel I350 Dual-Port Gigabit PCIe Adapter

Each configuration was tested at three different load points: 1 tile (the lowest load level), 7 tiles (an approximate mid-point), and 13 tiles, which was the maximum number of tiles that still met Quality of Service (QoS) requirements. All datapoints represent the mean of two tests of each configuration.

VMmark Results

RAID Group vs. Storage Pool Performance comparison using VMmark benchmark

Across all load levels tested, the VMmark performance score, which is a function of application throughput, was similar regardless of storage provisioning type. Neither the storage type used nor the capacity allocated affected throughput.

VMmark 2.5 performance scores are based on application and infrastructure workload throughput, while application latency reflects Quality of Service. For the Mail Server, Olio, and DVD Store 2 workloads, latency is defined as the application’s response time. We wanted to see how storage configuration affected application latency as opposed to the VMmark score. All latencies are normalized to the lowest 1-tile results.

Storage configuration did not affect VMmark application latencies.

Application Latency in VMmark Storage Comparison RAID Group vs Storage Pool

Lastly, we measured read and write I/O latencies: esxtop Average Guest MilliSec/Write and Average Guest MilliSec/Read. This is the round trip I/O latency as seen by the Guest operating system.

VMmark Storage Latency Storage Comparison RAID Group vs Storage Pool

No differences emerged in I/O latencies.

I/O Analyzer with Iometer Testing

In the second set of experiments, we wanted to see if we would find similar results while testing storage using a synthetic microbenchmark. I/O Analyzer is a tool which uses Iometer to drive load on a Linux-based virtual machine then collates the performance results. The benefit of using a microbenchmark like Iometer is that it places heavy load on just the storage subsystem, ensuring that no other subsystem is the bottleneck.

Configuration

Testing used a VNX5800 array and RAID 5 level as in the prior configuration, but all storage configurations spanned 9 SSDs, also a preferred drive count. In contrast to the prior test, the storage pool or RAID group spanned an identical number of disks, so that the number of disks per LUN was the same in both configurations. Testing used nine disks per LUN to achieve greater load on each disk.

The LUN was sized to either 50% or 100% of the storage group. The LUN capacity was fully occupied with the I/O Analyzer worker VM and randomized data.  The I/O Analyzer Controller VM, which initiates the benchmark, was located on a separate array and host.

Storage Configuration Iometer with Storage Pool and RAID Group

Testing used one I/O Analyzer worker VM. One Iometer worker thread drove storage load. The size of the VM’s virtual disk determines the size of the active dataset, so a 100GB thick-provisioned virtual disk on VMFS-5 was chosen to maximize I/O to the disk and minimize caching. We tested at a medium load level using a plausible datacenter I/O profile, understanding, however, that any static I/O profile will be a broad generalization of real-life workloads.

Iometer Configuration

  • 1 vCPU, 2GB memory
  • 70% read, 30% write
  • 100% random I/O to model the “I/O blender effect” in a virtualized environment
  • 4KB block size
  • I/O aligned to sector boundaries
  • 64 outstanding I/O
  • 60 minute warm up period, 60 minute measurement period
Systems Under Test: One Dell PowerEdge R720 server
Configuration Per Server:  
     Virtualization Platform: VMware vSphere 6.0. Worker VM used the I/O Analyzer default virtual hardware version 7.
     CPUs: Two 12-core Intel® Xeon® E5-2697 v2 @ 2.7 GHz, Turbo Boost Enabled, up to 3.5 GHz, Hyper-Threading enabled.
     Memory: 256GB ECC DDR3 @ 1866MHz
     Host Bus Adapter: QLogic ISP2532 DualPort 8Gb Fibre Channel to PCI Express

Iometer results

Iometer Latency Results Storage Comparison RAID Group vs Storage PoolIometer Throughput Results Storage Comparison RAID Group vs Storage Pool

In Iometer testing, the storage pool showed slightly improved performance compared to the RAID group, and the amount of capacity allocated also did not affect performance.

In both our multi-workload and synthetic microbenchmark scenarios, we did not observe any performance penalty of choosing storage pools over RAID groups on an all-SSD array, even when disparate workloads shared the same storage pool. We also did not find any performance benefit at the application or I/O level from leaving unallocated capacity, or overprovisioning, SSD RAID groups or storage pools. Given the ease of management and feature-based benefits of storage pools, including automated storage tiering, compression, deduplication, and thin provisioning, storage pools are an excellent choice in today’s datacenters.

Virtual SAN 6.0 Performance with VMware VMmark

Virtual SAN is a storage solution that is fully integrated with VMware vSphere. Virtual SAN leverages flash technology to cache data and improve its access time to and from the disks. We used VMware’s VMmark 2.5 benchmark to evaluate the performance of running a variety of tier-1 application workloads together on Virtual SAN 6.0.

VMmark is a multi-host virtualization benchmark that uses varied application workloads and common datacenter operations to model the demands of the datacenter. Each VMmark tile contains a set of virtual machines running diverse application workloads as a unit of load. For more details, see the VMmark 2.5 overview.

 

Testing Methodology

VMmark 2.5 requires two datastores for its Storage vMotion workload, but Virtual SAN creates only a single datastore. A Red Hat Enterprise Linux 7 virtual machine was created on a separate host to act as an iSCSI target to serve as the secondary datastore. Linux-IO Target (LIO) was used for this.

 

Configuration

Systems Under Test 8x Supermicro SuperStorage SSG-2027R-AR24 servers
CPUs (per server) 2x Intel Xeon E5-2670 v2 @ 2.50 GHz
Memory (per server) 256 GiB
Hypervisor VMware vSphere 5.5 U2 and vSphere 6.0
Local Storage (per server) 3x 400GB Intel SSDSC2BA4012x 900GB 10,000 RPM WD Xe SAS drives
Benchmarking Software VMware VMmark 2.5.2

 

Workload Characteristics

Storage performance is often measured in IOPS, or I/Os per second. Virtual SAN is a storage technology, so it is worthwhile to look at how many IOPS VMmark is generating.  The most disk-intensive workloads within VMmark are DVD Store 2 (also known as DS2), an E-Commerce workload, and the Microsoft Exchange 2007 mail server workload. The graphs below show the I/O profiles for these workloads, which would be identical regardless of storage type.

 Figure1

The DS2 database virtual machine shows a fairly balanced I/O profile of approximately 55% reads and 45% writes.

Microsoft Exchange, on the other hand, has a very write-intensive load, as shown below.

Figure2

Exchange sees nearly 95% writes, so the main benefit the SSDs provide is to serve as a write buffer.

The remaining application workloads have minimal disk I/Os, but do exert CPU and networking loads on the system.

 

Results

VMmark measures both the total throughput of each workload as well as the response time.  The application workloads consist of Exchange, Olio (a Java workload that simulates Web 2.0 applications and measures their performance), and DVD Store 2. All workloads are driven at a fixed throughput level.  A set of workloads is considered a tile.  The load is increased by running multiple tiles.  With Virtual SAN 6.0, we could run up to 40 tiles with acceptable quality of service (QoS). Let’s look at how each workload performed with increasing the number of tiles.

DVD Store

There are 3 webserver frontends per DVD Store tile in VMmark.  Each webserver is loaded with a different profile.  One is a steady-state workload, which runs at a set request rate throughout the test, while the other two are bursty in nature and run a 3-minute and 4-minute load profile every 5 minutes.  DVD Store throughput, measured in orders per minute, varies depending on the load of the server. The throughput will decrease once the server becomes saturated.

Figure3

For this configuration, maximum throughput was achieved at 34 tiles, as shown by the graph above.  As the hosts become saturated, the throughput of each DVD Store tile falls, resulting in a total throughput decrease of 4% at 36 tiles. However, the benchmark still passes QoS at 40 tiles.

Olio and Exchange

Unlike DVD Store, the Olio and Exchange workloads operate at a constant throughput regardless of server load, shown in the table below:

Workload Simulated Users Load per Tile
Exchange 1000 320-330 Sendmail actions per minute
Olio 400 4500-4600 operations per minute

 

At 40 tiles the VMmark clients are sending over ~12,000 mail messages per minute and the Olio webservers served ~180,000 requests per minute.

As the load increases, the response time of Exchange and Olio increases, which makes them a good demonstration of the end-user experience at various load levels. A response time of over 500 milliseconds is considered to be an unacceptable user experience.

Figure4

As we saw with DVD Store, performance begins to dramatically change after 34 tiles as the cluster becomes saturated.  This is mostly seen in the Exchange response time.  At 40 tiles, the response time is over 300 milliseconds for the mailserver workload, which is still within the 500 millisecond threshold for a good user experience. Olio has a smaller increase in response time, since it is more processor intensive.  Exchange has a dependence on both CPU and disk performance.

Looking at Virtual SAN performance, we can get a picture of how much I/O is served by the storage at these load levels.  We can see that reads average around 2000 read I/Os per second:

Figure5

The Read Cache hit rate is 98-99% on all the hosts, so most of these reads are being serviced by the SSDs. Write performance is a bit more varied.

Figure6

We see a range of 5,000-10,000 write IOPS per node due to the write-intensive Exchange workload. Storage is nowhere close to saturation at these load levels. The magnetic disks are not seeing much more than 100 I/Os per second, while the SSDs are seeing about 3,000 – 6,000 I/Os per second. These disks should be able to handle at least 10x this load level. The real bottleneck is in CPU usage.

Looking at the CPU usage of the cluster, we can see that the usage levels out at 36 tiles at about 84% used.  There is still some headroom, which explains why the Olio response times are still very acceptable.

Figure7

As mentioned above, Exchange performance is dependent on both CPU and storage. The additional CPU requirements that Virtual SAN imposes on disk I/O causes Exchange to be more sensitive to server load.

 

Performance Improvements in Virtual SAN 6.0 (vs. Virtual SAN 5.5)

The Virtual SAN 6.0 release incorporates many improvements to CPU efficiency, as well as other improvements. This translates to increased performance for VMmark.

VMmark performance increased substantially when we ran the tests with Virtual SAN 6.0 as opposed to Virtual SAN 5.5. The Virtual SAN 5.5 tests failed to pass QoS beyond 30 tiles, meaning that at least one workload failed to meet the application latency requirement.  During the Virtual SAN 5.5 32-tile tests, one or more Exchange clients would report a Sendmail latency of over 500ms, which is determined to be a QoS failure.  Version 6.0 was able to achieve passing QoS at up to 40 tiles.

Figure8

Not only were more virtual machines able to be supported on Virtual SAN 6.0, but the throughput of the workloads increased as well.  By comparing the VMmark score (normalized to 20-tile Virtual SAN 5.5 results) we can see the performance improvement of Virtual SAN 6.0.

Figure9

Virtual SAN 6.0 achieved a performance improvement of 24% while supporting 33% more virtual machines.

 

Conclusion

Using VMmark, we are able to run a variety of workloads to simulate applications in a production environment.  We were able to demonstrate that Virtual SAN is capable of achieving good performance running heterogeneous real world applications.  The cluster of 8 hosts presented here show good performance in VMmark through 40 tiles.  This is ~12,000 mail messages per minute sent through Exchange, ~180,000 requests per minute served by the Olio webservers, and over 200,000 orders per minute processed on the DVD Store database.  Additionally, we were able to measure substantial performance improvements over Virtual SAN 5.5 using Virtual SAN 6.0.

 

Custom Power Management Settings for Power Savings in vSphere 5.5

VMware vSphere serves as a common virtualization platform for a diverse ecosystem of applications. Every application has different performance demands which must be met, but the power and cooling costs of running these applications are also a concern. vSphere’s default power management policy, “Balanced”, meets both of these goals by effectively preserving system performance while still saving some power.

For those who would like to prioritize energy efficiency even further, vSphere provides additional ways to tweak its power management under the covers. Custom power management settings in ESXi let you create your own power management policy, and your server’s BIOS also typically lets you customize hardware settings which can maximize power savings at a potential cost to performance.

When choosing a low power setting, we need to know whether it is effective at increasing energy efficiency, that is, the amount of work achieved for the power consumed. We also need to know how large of an impact the setting has on application throughput and latencies. A power saving setting that is too aggressive can result in low system performance. The best combination of power saving techniques will be highly individualized to your workload; here, we present one case study.

We used the VMmark virtualization benchmark to measure the effect of ESXi custom power settings and BIOS custom settings on energy efficiency and performance. VMmark 2.5 is a multi-host virtualization benchmark that uses diverse application workloads as well as common platform level workloads to model the demands of the datacenter. VMs running a complete set of the application workloads are grouped into units of load called tiles. For more details, see the VMmark 2.5 overview.

In this study, the best custom power setting produced an increase in energy efficiency of 17% with no significant drop in performance at moderate levels of load.

Test Methodology

All tests were conducted on a two-node cluster running VMware vSphere 5.5 U1. Each custom power management setting was tested independently to gauge its effects on energy efficiency and performance while all other settings were left at their defaults. The settings tested fall into two categories: ESXi custom power settings and BIOS custom settings. We discuss how to modify these settings at the end of the article.

Systems Under Test: Two Dell PowerEdge R720 servers
Configuration Per Server  
            CPUs: Two 12-core Intel® Xeon® E5-2697 v2 @ 2.7 GHz, Turbo Boost Enabled, up to 3.5 GHz, Hyper-Threading enabled
            Memory: 256GB ECC DDR3 @ 1866MHz
            Host Bus Adapter: QLogic ISP2532 Dual Port 8Gb Fibre Channel to PCI Express
           Network Controller: Integrated Intel I350 Quad-Port Gigabit Adapter, one Intel I350 Dual-Port Gigabit PCIe Adapter
            Hypervisor: VMware ESXi 5.5 U1
Shared Resources  
            Virtualization Management: VMware vCenter Server 5.5
            Storage Array: EMC VNX5800
30 Enterprise Flash Drives (SSDs) and 32 HDDs, grouped as two 10-SSD RAID0 LUNs and four 8-HDD RAID0 LUNs. FAST Cache was configured from 10 SSDs.
            Power Meters: One Yokogawa WT210 per server

Each configuration was tested at five different load points: 1 tile (the lowest load level), 4, 7, 10, and 12 tiles, which was the maximum number of tiles that met Quality of Service (QoS) requirements. All datapoints are the mean of three tests in each configuration.

ESXi Custom Power Settings

ESXi custom power settings influence the power state of the processor. We tested two custom power management settings which had the greatest impact on our workload: Power.MaxFreqPct and Power.CstateResidencyCoef. The advanced ESXi setting Power.MaxFreqPct (default value 100) reduces the processor frequency by placing a cap on the highest operating frequency it can reach. In practice, the processor can operate only at certain set frequencies (P-states), so if the frequency cap requested by ESXi (e.g. 2160MHz) does not match to a set frequency state, the processor will run at the nearest lower frequency state (e.g. 2100MHz). Setting Power.MaxFreqPct = 99 put the cap at 99% of the processor’s nominal frequency, which limited Turbo Boost. Power.MaxFreqPct = 80 further limited the maximum frequency of the processor to 80% of its nominal frequency of 2.7GHz, for a maximum of 2.1GHz. Setting Power.CstateResidencyCoef = 0 (default value 5) puts the processor into its deepest available C-state, or lowest power state, when it is idle. As a prerequisite, deep C-states must be enabled in the BIOS. For a more in-depth discussion of power management techniques and other custom options, please see the vSphere documentation and the whitepaper Host Power Management in VMware vSphere 5.5.

VMmark models energy efficiency as performance score per kilowatt of power consumed. VMmark scores in the graph below have been normalized to the default “Balanced” 1-tile result, which does not use any custom power settings.

VMware ESXi Custom Power Management Settings improve efficiency

A major trend can be seen here; an increase in load is correlated with greater energy efficiency. As the CPUs become busier, throughput increases at a faster rate than the required power. This can be understood by noting that an idle server will still consume power, but with no work to show for it. A highly utilized server is typically the most energy efficient per request completed, and the results bear this out.

To more closely examine the relative impact of each custom setting compared to the default setting, we normalized all results within each load level to the default “Balanced” result for that number of tiles. The figure below shows the percent change at each load level.

VMware ESXi Custom Power Management Settings Change in Efficiency and Performance Results

All custom settings showed improvements in efficiency compared to the default “Balanced” setting. The improvements varied depending on load. Setting MaxFreqPct to 99 had the greatest benefit to energy efficiency, between 5% and 15% at varying load levels. The greatest improvement was seen at 4 tiles, which increased efficiency by 17%, while resulting in a performance decrease of only 3%. The performance cost increased with load to 9% at 12 tiles. However, limiting processor frequency even further to a maximum of 80% of its nominal frequency does not produce an additive effect. Not only did efficiency actually decrease relative to MaxFreqPct=99, but it profoundly curtailed performance from 96% of baseline at light load to 84% of baseline for a heavily loaded machine. CstateResidency=0 produced some modest increases in efficiency for a lightly loaded server, but the effect disappeared at higher load levels.

VMmark 2.5 performance scores are based on application and infrastructure workload throughput, while application latency reflects Quality of Service. For the Mail Server, Olio, and DVD Store 2 workloads, latency is defined as the application’s response time. We wanted to see how custom power management settings affected application latency as opposed to the VMmark score. All latencies are normalized to the lowest 1-tile results.

VMware ESXi Custom Power Management Settings Effect on Application Latencies

Naturally, latencies increase as load increases from 1 to 12 tiles. Fortunately, the custom power management policies caused only minimal increases in application latencies, if any, except for the MaxFreqPct=80 setting which did create elevated latencies across the board.

BIOS Custom Power Settings

The Dell PowerEdge R720 BIOS provides another toolbox of power-saving knobs to tweak. Using the BIOS settings, we manually disabled Turbo Boost and reduced memory frequency from its default maximum speed of 1866MT/s (megatransfers per second) to either 1333MT/s or 800MT/s.

Custom-Power-Management-BIOS-Efficiency

The Turbo Boost Disabled configuration produced the largest increase in efficiency, while 800MT/s memory frequency actually decreased efficiency at the higher load levels.
Again, we normalized all results within each load level to its default “Balanced” result. The figure below shows the percent change at each load level.

Custom-Power-Management-BIOS-Efficiency-and-Perf
Disabling Turbo Boost was the most effective setting to increase energy efficiency, with a performance cost of 2% at low load levels to 8% at high load levels. Reducing memory frequency to 1333MT/s had a reliable but small boost to efficiency and no effect on performance, leading us to conclude that a memory speed of 1866MT/s is simply faster than needed for this workload.

Custom-Power-Management-BIOS-Application-Latencies
Disabling Turbo Boost and reducing memory frequency to 800MT/s increased DVD Store 2 latencies at 10 tiles by 10% and 12 tiles by 30%, but all latencies were still well within Quality of Service requirements.  Reducing memory frequency to 1333MT/s had no effect on application latencies.

Reducing the use of Turbo Boost, using either ESXi custom setting MaxFreqPct or BIOS custom settings, proved to be the most effective way to increase energy efficiency in our VMmark tests. The impact on performance was small, but increased with load. MaxFreqPct is the preferred setting because, like all ESXi custom power management settings, it takes effect immediately and can easily be reversed without reboots or downtime. Other custom power management settings produced modest gains in efficiency, but, if taken to the extreme, not only harm performance but fail to increase efficiency. In addition, energy efficiency is strongly related to load; the most efficient server is also one that is heavily utilized. Taking steps to increase server utilization, such as server consolidation, is an important part of a power saving strategy. Custom power management settings can produce gains in energy efficiency at a cost to performance, so consider the tradeoff when choosing custom power management settings for your own environment.


 How to Configure Custom Power Management Settings

Disclaimer: The results presented above are a case study of the impact of custom power management settings and a starting point only. Results may not apply to your environment and do not represent best practices.

Exercise caution when choosing a custom power management setting. Change settings one at a time to evaluate their impact on your environment. Monitor your server’s power consumption either through its UPS, or consult your vendor to find the rated accuracy of your server’s internal power monitoring sensor. If it is highly accurate, you can view the server’s power consumption in esxtop (press ‘p’ to view Power Usage).

To customize power management settings, enter your server’s BIOS. Power Management settings vary by vendor but most include “OS Controlled” and “Custom” policies.

In the Dell PowerEdge R720, choosing the “Performance Per Watt (OS)” System Profile allows ESXi to control power management, while leaving hardware settings at their default values.

Screenshot of R720 BIOS Selecting OS controlled power managment

Choosing the “Custom” System Profile and setting CPU Power Management to “OS DBPM” allows ESXi to control power management while enabling custom hardware settings.

Screenshot-R720-BIOS

Using ESXi Custom Power Settings

To enable the vSphere custom power management policy,

  1. Browse to the host in the vSphere Web Client navigator.
  2. Click the Manage tab and click Settings.
  3. Under Hardware, select Power Management and click the Edit button.
  4. Select the Custom power management policy and click OK.

The power management policy changes immediately and does not require a server reboot.

Screenshot-VMware-ESXi-Host-Power-Management-SettingScreenshot-VMware-ESXi-Custom-Power-Manangement-Setting

To modify ESXi custom power management settings,

  1. Browse to the host in the vSphere Web Client navigator.
  2. Click the Manage tab and click Settings.
  3. Under System, select Advanced System Settings.
  4. Power management parameters that affect the Custom policy have descriptions that begin with In Custom policy. All other power parameters affect all power management policies.
  5. Select the parameter and click the Edit button.

Note: The default values of power management parameters match the Balanced policy.

Screenshot-VMware-ESXi-Advanced-System-Settings

 

Reducing Power Consumption in the vSphere 5.5 Datacenter

Today’s virtualized datacenters consist of several servers connected to shared storage, and this configuration has been necessary to enable the flexibility that virtualization provides and still allow for high performance. However, the power consumption of this setup is a major concern because shared storage can consume as much as 2-3x the power of a single, mid-ranged server. In this blog, we look at the performance impact of replacing shared storage with local disks and PCIe flash storage in a vSphere 5.5 datacenter to save power.

We leverage two innovative vSphere features in this performance test:

  • Unified live migration, first introduced with vSphere 5.1, removes the shared storage requirement for vMotion and allows combining traditional vMotion and Storage vMotion into one operation. This combined live migration copies both the virtual machine’s memory and storageover the network to the destination vSphere host. This feature offers administrators significantly more simplicity and flexibility in managing and moving virtual machines across their virtual infrastructures compared to the traditional vMotion and Storage vMotion migration solutions. More information about vMotion can be found in the VMware vSphere 5.1 vMotion Architecture, Performance, and Best Practices white paper.
  • vSphere 5.5 improves server power management by enabling processor C-states, in addition to the previously-used P-states, to improve power savings in the Balanced policy setting. More information about these improvements can be found in the Host Power Management in vSphere 5.5 white paper.

We measure the performance and power savings of these features when replacing shared storage with local disks and PCIe flash storage using a modified version of VMware VMmark 2.5. VMmark is a multi-host virtualization benchmark that uses varied application workloads, as well as common datacenter operations to model the demands of the datacenter. Each VMmark tile contains a set of VMs running diverse application workloads as a unit of load. For more details, see the VMmark 2.5 overview. The benchmark was modified to replace the traditional vMotion workload component with the new shared-nothing, unified live migration.

Testing Methodology

VMmark 2.5 was modified to convert the vMotion workload into a migration without shared storage. All other workloads were unchanged. This allowed a comparison of local, direct attached storage to a traditional Fibre Channel SAN. We measured the power consumption of each configuration using a pair of Yokogawa WT210 power meters, one attached to the servers and the other attached to the external storage.

Configuration

  • Systems Under Test: 2x Dell PowerEdge R710 servers
  • CPUs (per server): 2x Intel Xeon X5670 @ 2.93 GHz
  • Memory (per server): 96 GiB
  • Hypervisor: VMware vSphere 5.5
  • Local Storage (per server): 1x 785GB Fusion-io ioDrive2, 2x 300GB 10K RPM SAS drives in RAID 0
  • SAN: 8Gb Fibre Channel, 30x 200GB SATA Flash drives, 30x 600GB 15K RPM SAS drives
  • Benchmarking software: VMware VMmark 2.5

All I/O-intensive virtual disks were stored on the Fusion-io devices for local storage tests or the SATA flash drives for the SAN tests.  This included the DVD Store database files, the mail server database, and the Olio database.  All remaining virtual machine data was stored on the local SAS drives for the local storage tests and the SAN SAS drives for the SAN tests.

Results
 
VMmark performance using shared-nothing, unified live migration backed by fast local storage showed only minor differences compared to the results with shared storage.  The largest variance was seen in the infrastructure operations, which was expected as the vMotion workload was modified to include a storage migration.  The chart below shows the scores normalized to the 3-tile SAN test results.

scores

When we add the power data to these results, and compare the Performance Per Killowatt (PPKW), we see a much different picture.  The local storage-based PPKW score is much higher than shared storage due to higher power efficiency.

ppkw

We can see the reason for this difference is due to the power consumption of each configuration.  The SAN is consuming over 1000 watts, which is typical of this storage solution.  Replacing that power-hungry component with local storage greatly reduces vSphere datacenter power consumption while maintaining good performance.

power

This SAN should be able to support approximately 25 VMmark tiles (based on the storage capacity of the SSDs), roughly five times the load being supported by the two servers we had available for testing in our lab. However, it should be noted that these servers are two generations old. Current-generation two-socket servers with a comparable power usage can support 2-3x the number of tiles based on published VMmark results. This would imply that the SAN could support at most four current-generation servers. While an additional two servers will further amortize the power cost of the SAN, significant power savings would still be achieved with an all-local storage architecture.

This is not without a cost.  Removing shared storage reduces the functionality of the datacenter because there are a number of vSphere features which will no longer function, such as DRS and traditional vMotion. The reduction in the infrastructure performance due to no shared storage will limit the workloads that can be run in this manner to virtual machines with smaller disks which can be moved between hosts without shared storage fairly quickly. Virtual machines with large disks would take much longer to move and would be better suited to a shared storage environment.

We have shown that it is possible to significantly reduce datacenter power consumption without significantly reducing performance by replacing shared storage with local storage solutions.  Unified live migration enables the use of local storage without a significant infrastructure performance penalty while maintaining application performance comparable to traditional environments using shared storage for the server workloads represented in VMmark.  The resulting elimination of shared storage creates significant power savings and lower operations costs.

Power Management and Performance in VMware vSphere 5.1 and 5.5

Power consumption is an important part of the datacenter cost strategy. Physical servers frequently offer a power management scheme that puts processors into low power states when not fully utilized, and VMware vSphere also offers power management techniques. A recent technical white paper describes the testing and results of two performance studies: The first shows how power management in VMware vSphere 5.5 in balanced mode (the default) performs 18% better than the physical host’s balanced mode power management setting. The second study compares vSphere 5.1 performance and power savings in two server models that have different generations of processors. Results show the newer servers have 120% greater performance and 24% improved energy efficiency over the previous generation.

For more information, please read the paper: Power Management and Performance in VMware vSphere 5.1 and 5.5.