Home > Blogs > VMware VROOM! Blog > Tag Archives: VMmark

Tag Archives: VMmark

AMD 2nd Gen EPYC (Rome) Application Performance on vSphere Series: Part 2 – VMmark

In recently published benchmarks with VMware VMmark, we’ve seen lots of great results with the AMD EPYC 7002 Series (known as 2nd Gen EPYC, or “Rome”) by several of our partners. These results show how well a mixed workload environment with many virtual machines and infrastructure operations like vMotion can perform with new server platforms.

This is the second part of our series covering application performance on VMware vSphere running with AMD 2nd Gen EPYC processors (also see Part 1 of this series). This post focuses on the VMmark benchmark used as an application workload.

We used the following hardware and software in our testbed:

  • AMD 2nd Gen EPYC processors (“Rome”)
  • Dell EMC XtremIO all-flash array
  • VMware vSphere 6.7 U3
  • VMware VMmark 3.1

VMmark

VMmark is a benchmark widely used to study virtualization performance. Many VMware partners have used this benchmark—from the initial release of VMmark 1.0 in 2006, up to the current release of VMmark 3.1 in 2019—to publish official results that you can find on the VMmark 3.x results site. This long history and large set of results can give you an understanding of performance on platforms you’re considering using in your datacenters. AMD EPYC 7002–based systems have done well and, in some cases, have established leadership results. At the publishing of this blog, VMmark results on AMD EPYC 7002 have been published by HPE, Dell | EMC, and Lenovo.

If you look though the details of these published results, you can find some interesting information like disclosed configuration options or settings that might provide performance improvements to your environment. For example, in the details of a Dell |EMC result, you can find that the server BIOS setting Numa Nodes Per Socket (NPS) was set to 4. And in the HPE submission, you can find that the server BIOS setting Last Level Cache (LLC) as NUMA Node was set to enabled. This is also referred to as CCX as NUMA because each CCX has its own LLC. (CCX is described in the following section.)

We put together a VMmark test setup using AMD EPYC Rome 2-socket servers to evaluate performance of these systems internally. This gave us the opportunity to see how the NPS and CCX as NUMA BIOS settings affected performance for this workload.

In the initial post of this series, we looked at the effect of the NPS BIOS settings on a database-specific workload and described the specifics of those settings. We also provided a brief overview and links to additional related resources. This post builds on that one, so we highly recommend reading the earlier post if you’re not already familiar with the BIOS NPS settings and the details for the AMD EPYC 7002 series.

AMD EPYC CCX as NUMA

Each EPYC processor is made up of up to 8 Core Complex Dies (CCDs) that are connected by AMD’s Infinity fabric (figure 1). Inside each CCD, there are two Core Complexes (CCXs) that each have their own LLC cache shown as 16M L3 cache in figure 2. These diagrams (the same ones from the previous blog post) are helpful in illustrating these aspects of the chips.

Figure 1. Logical diagram of AMD EPYC Rome processor

Figure 2. Logical diagram of CCX

The CCX as NUMA or LLC as NUMA BIOS settings can be configured on most of the AMD EPYC 7002 Series processor–based servers. The specific name of the setting will be slightly different for different server vendors. When enabled, the server will present the four cores that share each L3 cache as a NUMA node. In the case of the EPYC 7742 (Rome) processors used in our testing, there are 8 CCDs that each have 2 CCXs for a total 16 CCXs per processor. With CCX as NUMA enabled, each processor is presented as 16 NUMA nodes, with 32 NUMA nodes total for our 2-socket server. This is quite different from the default setting of 1 NUMA node per socket for a total of 2 NUMA nodes for the host.

This setting has some potential to improve performance by exposing the architecture of the processors as individual NUMA nodes, and one of the VMmark result details indicated that it might have been a factor in improving the performance for the benchmark.

VMmark 3 Testing

For this performance study, we used as the basis for all tests:

  • Two 2-socket systems with AMD EPYC 7742 processors and 1 TB of memory
  • 1 Dell EMC XtremIO all-flash array
  • VMware vSphere 6.7 U3 installed on a local NVMe disk
  • VMmark 3.1—we ran all tests with 14 VMmark tiles: that’s the maximum number of tiles that the default configuration could handle without failing quality of service (QoS). For more information about tiles, see “Unique Tile-Based Implementation” on the VMmark product page.

We tested the default cases first: CCX as NUMA disabled and Numa Per Socket (NPS) set to 1. We then tested the configurations of NPS 2 and 4, both with and without CCX as NUMA enabled.  Figure 3 below shows the results with the VMmark score and the average host CPU utilization.  VMmark scores are reported relative to the score achieved by the default settings of NPS 1 and CCX as NUMA disabled.

From a high level, the VMmark score and the overall performance of the benchmark does not change a large amount across all of the test cases. These fall within the 1-2% run-to-run variation we see with this benchmark and can be considered equivalent scores. We saw the lowest results with NPS 2 and NPS 4 settings. These results showed 3% and 5% reductions respectively, indicating that those settings don’t give good performance in our environment.

Figure 3. VMmark 3 performance on AMD 2nd Gen EPYC with NPS and CCX as NUMA Settings

We observed one clear trend in the results: a lower CPU utilization in the 6% to 8% range with CCX as NUMA enabled.  This shows that there are some small efficiency gains with CCX as NUMA in our test environment. We didn’t see a significant improvement in overall performance due to these efficiency gains; however, this might allow an additional tile to run on the cluster (we didn’t test this).  While some small gains in efficiency are possible with these settings, we don’t recommend moving away from the default settings for general-purpose, mixed-workload environments.  Instead, you should evaluate these advanced settings for specific applications before using them.

Using VMmark 3 as a Performance Analysis Tool

VMmark was originally developed to fill the need for a server consolidation benchmark for a rapidly changing datacenter that was becoming increasingly dominated by virtualization.  The design of VMmark, which is a collection of workloads, gives us the ability to quickly change workload parameters to modify the behavior of the entire benchmark. This allows us to use VMmark to exercise technologies that were not available at the time the benchmark was designed. The VMmark 3 run rules provide for academic or research results publication using a modified version of the benchmark.

VMmark 3 was designed in 2015 when the memory size of a typical high-end 2 socket server was 768 GB.  Each VMmark 3 tile was configured to use 156 GB of memory, allowing multiple tiles to be run on each server.  A new technology, Intel Optane DC Persistent Memory, now allows up to 3 TB of memory in a 2 socket server, with plans to increase that even further.  Testing the performance of this technology with an unmodified version of VMmark 3 wouldn’t be easy as we’d saturate CPU resources long before we could fully exercise this large amount of memory.  Thankfully the flexible nature of VMmark allows us to modify it to consume significantly more memory with minimal changes in CPU usage.

The two primary VMmark workloads are Weathervane and DVD Store.  Each can be modified to consume more memory.  Weathervane, as configured for VMmark 3, uses 14 VMs.  Thus while it would be possible to modify this application, doing so would be a time-consuming process.  We therefore decided to look at DVD Store, which uses only four VMs.  Most of the work is done in the DVD Store database VM which was our target for modification.

Determining the best configuration for DVD Store to utilize a larger amount of memory required multiple iterations of testing.   We modified one test parameter of the DVD Store workload, and then examined the results to determine the effect on the VMmark tile. We were looking for larger memory usage with a minimal increase in CPU usage so that we could exercise the larger memory configuration without requiring additional CPUs. The following table lists the default configuration and the variables we changed:

Parameter Default Configurations Tried
VM Memory Size 32 GB 128, 250 and 385 GB
Think Time 1 second 0.5, 0.9, 1.25, and 1.5 seconds
Number of Threads 24 36 and 48
Number of Searches 3 5, 7, and 9
Batch Search Size 3 5, 7, and 9
Database Size 100 GB 300 and 500 GB

The final configuration that we determined to have the most increased memory usage while keeping the CPU usage moderate was 250 GB DS3DB VM memory size, 1.5 seconds think time, and 300 GB database size.  All other parameters were kept at the default.

The following table lists the CPU and memory utilization of the default configuration and the “increased memory” configuration.

Configuration CPU Utilization Memory Utilization
Default 26.3 126 GB
Increased Memory 24.1 350 GB

We were able to almost triple the memory consumption of a single VMmark tile without increasing the CPU usage. Using this “increased memory” configuration for VMmark we can now see the effect of the additional memory provided by Intel Optane DC Persistent Memory in Memory Mode.

More detailed information about this configuration and the methodology used to refine it can be found in the Intel Optane DC Persistent Memory whitepaper.  Detailed instructions to configure VMmark 3 to increase the memory footprint can be obtained by emailing the VMmark team at vmmark-info@vmware.com.  We encourage you to experiment with VMmark under academic rules for your own studies and to let us know if you have any questions.

 

vSphere 6.7 Update 3 Supports AMD EPYC™ Generation 2 Processors, VMmark Showcases Its Leadership Performance

Two leadership VMmark benchmark results have been published with AMD EPYC™ Generation 2 processors running VMware vSphere 6.7 Update 3 on a two-node two-socket cluster and a four-node cluster. VMware worked closely with AMD to enable support for AMD EPYC™ Generation 2 in the VMware vSphere 6.7 U3 release.

The VMmark benchmark is a free tool used by hardware vendors and others to measure the performance, scalability, and power consumption of virtualization platforms and has become the standard by which the performance of virtualization platforms is evaluated.

The new AMD EPYC™ Generation 2 performance results can be found here and here.

View all VMmark results
Learn more about VMmark
These benchmark result claims are valid as of the date of writing.

Introducing VMmark ML

VMmark has been the go-to virtualization benchmark for over 12 years. It’s been used by partners, customers, and internally in a wide variety of technical applications. VMmark1, released in 2007, was the de-facto virtualization consolidation benchmark in a time when the overhead and feasibility of virtualization was still largely in question. In 2010, as server consolidation became less of an “if” and more of a “when,” VMmark2 introduced more of the rich vSphere feature set by incorporating infrastructure workloads (VMotion, Storage VMotion, and Clone & Deploy) alongside complex application workloads like DVD Store. Fast forward to 2017, and we released VMmark3, which builds on the previous versions by integrating an easy automation deployment service alongside complex multi-tier modern application workloads like Weathervane. To date, across all generations, we’ve had nearly 300 VMmark result publications (297 at the time of this writing) and countless internal performance studies.

Unsurprisingly, tech industry environments have continued to evolve, and so must the benchmarks we use to measure them. It’s in this vein that the VMware VMmark performance team has begun experimenting with other use cases that don’t quite fit the “traditional” VMmark benchmark. One example of a non-traditional use is Machine Learning and its execution within Kubernetes clusters. At the time of this writing, nearly 9% of the VMworld 2019 US sessions are about ML and Kubernetes. As such, we thought this might be a good time to provide an early teaser to VMmark ML and even point you at a couple of other performance-centric Machine Learning opportunities at VMworld 2019 US.

Although it’s very early in the VMmark ML development cycle, we understand that there’s a need for push-button-easy, vSphere-based Machine Learning performance analysis. As an added bonus, our prototype runs within Kubernetes, which we believe to be well-suited for this type of performance analysis.

Our internal-only VMmark ML prototype is currently streamlined to efficiently perform a limited number of operations very well as we work with partners, customers, and internal teams on how VMmark ML should be exercised. It is able to:

  1. Rapidly deploy Kubernetes within a vSphere environment.
  2. Deploy a variety of containerized ML workloads within our newly created VMmark ML Kubernetes cluster.
  3. Execute these ML workloads either in isolation or concurrently to determine the performance impact of architectural, hardware, and software design decisions.

VMmark ML development is still very fluid right now, but we decided to test some of these concepts/assumptions in a “real-world” situation. I’m fortunate to work alongside long-time DVD Store author and Big Data guru Dave Jaffe on VMmark ML.  As he and Sr. Technical Marketing Architect Justin Murray were preparing for their VMworld US talk, “High-Performance Virtualized Spark Clusters on Kubernetes for Deep Learning [BCA1563BU]“, we thought this would be a good opportunity to experiment with VMmark ML. Dave was able to use the VMmark ML prototype to deploy a 4-node Kubernetes cluster onto a single vSphere host with a 2nd-Generation Intel® Xeon® Scalable processor (“Cascade Lake”) CPU. VMmark ML then pulled a previously stored Docker container with several MLperf workloads contained within it. Finally, as a concurrent execution exercise, these workloads were run simultaneously, pushing the CPU utilization of the server above 80%. Additionally, Dave is speaking about vSphere Deep Learning performance in his talk “Optimize Virtualized Deep Learning Performance with New Intel Architectures [MLA1594BU],“ where he and Intel Principal Engineer Padma Apparao explore the benefits of Vector Neural Network Instructions (VNNI). I definitely recommend either of these talks if you want a deep dive into the details of VNNI or Spark analysis.

Another great opportunity to learn about VMware Performance team efforts within the Machine Learning space is to attend the Hands-on-Lab Expert Lead Workshop, “Launch Your Machine Learning Workloads in Minutes on VMware vSphere [ELW-2048-01-EMT_U],” or take the accompanying lab. This is being led by another VMmark ML team member Uday Kurkure along with Staff Global Solutions Consultant Kenyon Hensler. (Sign up for the Expert Lead using the VMworld 2019 mobile application or on my.vmworld.com.)

Our goal after VMworld 2019 US is to continue discussions with partners, customers, and internal teams about how a benchmark like VMmark ML would be most useful. We also hope to complete our integration of Spark within Kubernetes on vSphere and reproduce some of the performance analysis done to date. Stay tuned to the performance blog for additional posts and details as they become available.

First VMmark 3.1 Publications, Featuring New Cascade Lake Processors

VMmark is a free tool used by hardware vendors and others to measure the performance, scalability, and power consumption of virtualization platforms.  If you’re unfamiliar with VMmark 3.x, each tile is a grouping of 19 virtual machines (VMs) simultaneously running diverse workloads commonly found in today’s data centers, including a scalable Web simulation, an E-commerce simulation (with backend database VMs), and standby/idle VMs.

As Joshua mentioned in a recent blog post, we released VMmark 3.1 in February, adding support for persistent memory, improving workload scalability, and better reflecting secure customer environments by increasing side-channel vulnerability mitigation requirements.

I’m happy to announce that today we published the first VMmark 3.1 results.  These results were obtained on systems meeting our industry-leading side-channel-aware mitigation requirements, thus continuing the benchmark’s ability to provide an indication of real-world performance.

Continue reading

VMmark 3.1 Released

It is my great pleasure to announce that VMmark 3.1 is generally available as of February 7, 2019!

What’s New?

This release adds support for persistent memory, improves workload scalability, and better reflects secure customer environments by increasing side-channel vulnerability mitigation requirements.

Visit our main VMmark HTML page for more information.

Please note that VMmark 3.0 will end of life on March 15th, 2019.

To learn more about VMmark3 see the introductory blog article here

Addressing Meltdown/Spectre in VMmark

The recently described Meltdown/Spectre vulnerabilities have implications throughout the tech industry, and the VMmark virtualization benchmark is no exception. In deciding how to approach the issue, the VMmark team’s goal was to address the impact of the these vulnerabilities while maintaining the value and integrity of the benchmark.

Applying the full set of currently available Meltdown/Spectre mitigations is likely to have a significant impact on VMmark scores. Because the mitigations are expected to continue evolving for some time, that impact might even change. If the VMmark team were to require the full set of mitigations in order for a submission to be compliant, that might make new submissions non-competitive with older ones, and also introduce more “noise” into VMmark scores as the mitigations evolve. While our intention for the future is that eventually all new VMmark results will be obtained on virtualization platforms that have the full set of Meltdown/Spectre mitigations, we have chosen to take a gradual approach.

Beginning May 8, 2018, all newly-published VMmark results must comply with a number of new requirements related to the Meltdown and Spectre vulnerabilities. These requirements are detailed in Appendix C of the latest edition of the VMmark User’s Guide.

Before performing any VMmark benchmark runs intended for publication, check the VMmark download page to make sure you’re using the latest edition of the VMmark User’s Guide.  If you have questions, you can reach the VMmark team at vmmark-info@vmware.com.

Introducing VMmark3: A highly flexible and easily deployed benchmark for vSphere environments

VMmark 3.0, VMware’s multi-host virtualization benchmark is generally available here.  VMmark3 is a free cluster-level benchmark that measures the performance, scalability, and power of virtualization platforms.

VMmark3 leverages much of previous VMmark generations’ technologies and design.  It continues to utilize a unique tile-based heterogeneous workload application design. It also deploys the platform-level workloads found in VMmark2 such as vMotion, Storage vMotion, and Clone & Deploy.  In addition to incorporating new and updated application workloads and infrastructure operations, VMmark3 also introduces a new fully automated provisioning service that greatly reduces deployment complexity and time.

Continue reading

Measuring Cloud Scalability Using the Weathervane Benchmark

Cloud-based deployments continue to be a hot topic in many of today’s corporations.  Often the discussion revolves around workload portability, ease of migration, and service pricing differences.  In an effort to bring performance into the discussion we decided to leverage VMware’s new benchmark, Weathervane.  As a follow-on to Harold Rosenberg’s introductory Weathervane post we decided to showcase some of the flexibility and scalability of our new large-scale benchmark.  Previously, Harold presented some initial scalability data running on three local vSphere 6 hosts.  For this article, we decided to extend this further by demonstrating Weathervane’s ability to run within a non-VMware cloud environment and scaling up the number of app servers.

Weathervane is a new web-application benchmark architected to simulate modern-day web applications.  It consists of a benchmark application and a workload driver.  Combined, they simulate the behavior of everyday users attending a real-time auction.  For more details on Weathervane I encourage you to review the introductory post.

Environment Configuration:
Cloud Environment: Amazon AWS, US West.
Instance Types: M3.XLarge, M3.Large, C3.Large.
Instance Notes: Database instances utilized an additional 300GB io1 tier data disk.
Instance Operating System: Centos 6.5 x64.
Application: Weathervane Internal Build 084.

Testing Methodology:
All instances were run within the same cloud environment to reduce network-induced latencies.  We started with a base configuration consisting of eight instances.  We then  scaled out the number of workload drivers and application servers in an effort to identify how a cloud environment scaled as application workload needs increased.  We used Weathervane’s FindMax functionality which runs a series of tests to determine the maximum number of users the configuration can sustain while still meeting QoS requirements.  It should be noted that the early experimentation allowed us to identify the maximum needs for the other services beyond the workload drivers and application servers to reduce the likelihood of bottlenecks in these services.  Below is a block diagram of the configurations used for the scaled-out Weathervane deployment.

Fig1

Results:
For our analysis of Weathervane cloud scaling we ran multiple iterations for each scale load level and selected the average.  We automated the process to ensure consistency.  Our results show both the number of users sustained as well as the http requests per second as reported by the benchmark harness.

Fig2

As you can see in the above graph, for our cloud environment running Weathervane, scaling the number of applications servers yielded nearly linear scaling up to five application servers. The delta in scaling between the number of users and the http requests per second sustained was less than 1%.  Due to time constraints we were unable to test beyond five application servers but we expect that the scaling would have continued upwards well beyond the load levels presented.

Although just a small sample of what Weathervane and cloud environments can scale to, this brief article highlights both the benchmark and cloud environment scaling.  Though Weathervane hasn’t been released publicly yet, it’s easy to see how this type of controlled, scalable benchmark will assist in performance evaluations of a diverse set of environments.  Look for more Weathervane based cloud performance analysis in the future.

 

Virtualized Storage Performance: RAID Groups versus Storage pools

RAID, a redundant array of independent disks, has traditionally been the foundation of enterprise storage. Grouping multiple disks into one logical unit can vastly increase the availability and performance of storage by protecting against disk failure, allowing greater I/O parallelism, and pooling capacity. Storage pools similarly increase the capacity and performance of storage, but are easier to configure and manage than RAID groups.

RAID groups have traditionally been regarded as offering better and more predictable performance than storage pools. Although both technologies were developed for magnetic hard disk drives (HDDs), solid-state drives (SSDs), which use flash memory, have become prevalent. Virtualized environments are also common and tend to create highly randomized I/O given the fact that multiple workloads are run simultaneously.

We set out to see how the performance of RAID group and storage pool provisioning methods compare in today’s virtualized environments.

First, let’s take a closer look at each storage provisioning type.

RAID Groups

A RAID group unifies a number of disks into one logical unit and distributes data across multiple drives. RAID groups can be configured with a particular protection level depending on the performance, capacity, and redundancy needs of the environment. LUNs are then allocated from the RAID group. RAID groups typically contain only identical drives, and the maximum number of disks in a RAID group varies by system model but is generally below fifty. Because drives typically have well defined performance characteristics, the overall RAID group performance can be calculated as the performance of all drives in the group minus the RAID overhead. To provide consistent performance, workloads with different I/O profiles (e.g., sequential vs. random I/O) or different performance needs should be physically isolated in different RAID groups so they do not share disks.

Storage Pools

Storage pools, or simply ‘pools’, are very similar to RAID groups in some ways. Implementation varies by vendor, but generally pools are made up of one or more private RAID groups, which are not visible to the user, or they are composed of user-configured RAID groups which are added manually to the pool. LUNs are then allocated from the pool. Storage pools can contain up to hundreds of drives, often all the drives in an array. As business needs grow, storage pools can be easily scaled up by adding drives or RAID groups and expanding LUN capacity. Storage pools can contain multiple types and sizes of drives and can spread workloads over more drives for a greater degree of parallelism.

Storage pools are usually required for array features like automated storage tiering, where faster SSDs can serve as a data cache among a larger group of HDDs, as well as other array-level data services like compression, deduplication, and thin provisioning. Because of their larger maximum size, storage pools, unlike RAID groups, can take advantage of vSphere 6 maximum LUN sizes of 64TB.

We used two benchmarks to compare the performance of RAID groups and storage pools: VMmark, which is a virtualization platform benchmark, and I/O Analyzer with Iometer, which is a storage microbenchmark.  VMmark is a multi-host virtualization benchmark that uses diverse application workloads as well as common platform level workloads to model the demands of the datacenter. VMs running a complete set of the application workloads are grouped into units of load called tiles. For more details, see the VMmark 2.5 overview. Iometer places high levels of load on the disk, but does not stress any other system resources. Together, these benchmarks give us both a ‘real-world’ and a more focused perspective on storage performance.

VMmark Testing

Array Configuration

Testing was conducted on an EMC VNX5800 block storage SAN with Fibre Channel. This was one of the many storage solutions which offered both RAID group and storage pool technologies. Disks were 200GB single-level cell (SLC) SSDs. Storage configuration followed array best practices, including balancing LUNs across Storage Processors and ensuring that RAID groups and LUNs did not span the array bus. One way to optimize SSD performance is to leave up to 50% of the SSD capacity unutilized, also known as overprovisioning. To follow this best practice, 50% of the RAID group or storage pool was not allocated to any LUN. Since overprovisioning SSDs can be an expensive proposition, we also tested the same configuration with 100% of the storage pool or RAID group allocated.

RAID Group Configuration

Four RAID 5 groups were used, each composed of 15 SSDs. RAID 5 was selected for its suitability for general purpose workloads. RAID 5 provides tolerance against a single disk failure. For best performance and capacity, RAID 5 groups should be sized to multiples of five or nine drives, so this group maintains a multiple of the preferred five-drive count. One LUN was created in each of the four RAID groups. The LUN was sized to either 50% of the RAID group (Best Practices) or 100% (Fully Allocated). For testing, the capacity of each LUN was fully utilized by VMmark virtual machines and randomized data.

RAID Group Configuration VMmark Storage Comparison        VMmark Storage Pool Configuration Storage Comparison

Storage Pool Configuration

A single RAID 5 Storage Pool containing all 60 SSDs was used. Four thick LUNs were allocated from the pool, meaning that all of the storage space was reserved on the volume. LUNs were equivalent in size and consumed a total of either 50% (Best Practices) or 100% (Fully Allocated) of the pool capacity.

Storage Layout

Most of the VMmark storage load was created by two types of virtual machines: database (DVD Store) and mail server (Microsoft Exchange). These virtual machines were isolated on two different LUNs. The remaining virtual machines were spread across the remaining two LUNs. That is, in the RAID group case, storage-heavy workloads were physically isolated in different RAID groups, but in the storage pool case, all workloads shared the same pool.

Systems Under Test: Two Dell PowerEdge R720 servers
Configuration Per Server:  
     Virtualization Platform: VMware vSphere 6.0. VMs used virtual hardware version 11 and current VMware Tools.
     CPUs: Two 12-core Intel® Xeon® E5-2697 v2 @ 2.7 GHz, Turbo Boost Enabled, up to 3.5 GHz, Hyper-Threading enabled.
     Memory: 256GB ECC DDR3 @ 1866MHz
     Host Bus Adapter: QLogic ISP2532 DualPort 8Gb Fibre Channel to PCI Express
     Network Controller: One Intel 82599EB dual-port 10 Gigabit PCIe Adapter, one Intel I350 Dual-Port Gigabit PCIe Adapter

Each configuration was tested at three different load points: 1 tile (the lowest load level), 7 tiles (an approximate mid-point), and 13 tiles, which was the maximum number of tiles that still met Quality of Service (QoS) requirements. All datapoints represent the mean of two tests of each configuration.

VMmark Results

RAID Group vs. Storage Pool Performance comparison using VMmark benchmark

Across all load levels tested, the VMmark performance score, which is a function of application throughput, was similar regardless of storage provisioning type. Neither the storage type used nor the capacity allocated affected throughput.

VMmark 2.5 performance scores are based on application and infrastructure workload throughput, while application latency reflects Quality of Service. For the Mail Server, Olio, and DVD Store 2 workloads, latency is defined as the application’s response time. We wanted to see how storage configuration affected application latency as opposed to the VMmark score. All latencies are normalized to the lowest 1-tile results.

Storage configuration did not affect VMmark application latencies.

Application Latency in VMmark Storage Comparison RAID Group vs Storage Pool

Lastly, we measured read and write I/O latencies: esxtop Average Guest MilliSec/Write and Average Guest MilliSec/Read. This is the round trip I/O latency as seen by the Guest operating system.

VMmark Storage Latency Storage Comparison RAID Group vs Storage Pool

No differences emerged in I/O latencies.

I/O Analyzer with Iometer Testing

In the second set of experiments, we wanted to see if we would find similar results while testing storage using a synthetic microbenchmark. I/O Analyzer is a tool which uses Iometer to drive load on a Linux-based virtual machine then collates the performance results. The benefit of using a microbenchmark like Iometer is that it places heavy load on just the storage subsystem, ensuring that no other subsystem is the bottleneck.

Configuration

Testing used a VNX5800 array and RAID 5 level as in the prior configuration, but all storage configurations spanned 9 SSDs, also a preferred drive count. In contrast to the prior test, the storage pool or RAID group spanned an identical number of disks, so that the number of disks per LUN was the same in both configurations. Testing used nine disks per LUN to achieve greater load on each disk.

The LUN was sized to either 50% or 100% of the storage group. The LUN capacity was fully occupied with the I/O Analyzer worker VM and randomized data.  The I/O Analyzer Controller VM, which initiates the benchmark, was located on a separate array and host.

Storage Configuration Iometer with Storage Pool and RAID Group

Testing used one I/O Analyzer worker VM. One Iometer worker thread drove storage load. The size of the VM’s virtual disk determines the size of the active dataset, so a 100GB thick-provisioned virtual disk on VMFS-5 was chosen to maximize I/O to the disk and minimize caching. We tested at a medium load level using a plausible datacenter I/O profile, understanding, however, that any static I/O profile will be a broad generalization of real-life workloads.

Iometer Configuration

  • 1 vCPU, 2GB memory
  • 70% read, 30% write
  • 100% random I/O to model the “I/O blender effect” in a virtualized environment
  • 4KB block size
  • I/O aligned to sector boundaries
  • 64 outstanding I/O
  • 60 minute warm up period, 60 minute measurement period
Systems Under Test: One Dell PowerEdge R720 server
Configuration Per Server:  
     Virtualization Platform: VMware vSphere 6.0. Worker VM used the I/O Analyzer default virtual hardware version 7.
     CPUs: Two 12-core Intel® Xeon® E5-2697 v2 @ 2.7 GHz, Turbo Boost Enabled, up to 3.5 GHz, Hyper-Threading enabled.
     Memory: 256GB ECC DDR3 @ 1866MHz
     Host Bus Adapter: QLogic ISP2532 DualPort 8Gb Fibre Channel to PCI Express

Iometer results

Iometer Latency Results Storage Comparison RAID Group vs Storage PoolIometer Throughput Results Storage Comparison RAID Group vs Storage Pool

In Iometer testing, the storage pool showed slightly improved performance compared to the RAID group, and the amount of capacity allocated also did not affect performance.

In both our multi-workload and synthetic microbenchmark scenarios, we did not observe any performance penalty of choosing storage pools over RAID groups on an all-SSD array, even when disparate workloads shared the same storage pool. We also did not find any performance benefit at the application or I/O level from leaving unallocated capacity, or overprovisioning, SSD RAID groups or storage pools. Given the ease of management and feature-based benefits of storage pools, including automated storage tiering, compression, deduplication, and thin provisioning, storage pools are an excellent choice in today’s datacenters.