Home > Blogs > VMware VROOM! Blog > Tag Archives: IOPS

Tag Archives: IOPS

Virtualized Storage Performance: RAID Groups versus Storage pools

RAID, a redundant array of independent disks, has traditionally been the foundation of enterprise storage. Grouping multiple disks into one logical unit can vastly increase the availability and performance of storage by protecting against disk failure, allowing greater I/O parallelism, and pooling capacity. Storage pools similarly increase the capacity and performance of storage, but are easier to configure and manage than RAID groups.

RAID groups have traditionally been regarded as offering better and more predictable performance than storage pools. Although both technologies were developed for magnetic hard disk drives (HDDs), solid-state drives (SSDs), which use flash memory, have become prevalent. Virtualized environments are also common and tend to create highly randomized I/O given the fact that multiple workloads are run simultaneously.

We set out to see how the performance of RAID group and storage pool provisioning methods compare in today’s virtualized environments.

First, let’s take a closer look at each storage provisioning type.

RAID Groups

A RAID group unifies a number of disks into one logical unit and distributes data across multiple drives. RAID groups can be configured with a particular protection level depending on the performance, capacity, and redundancy needs of the environment. LUNs are then allocated from the RAID group. RAID groups typically contain only identical drives, and the maximum number of disks in a RAID group varies by system model but is generally below fifty. Because drives typically have well defined performance characteristics, the overall RAID group performance can be calculated as the performance of all drives in the group minus the RAID overhead. To provide consistent performance, workloads with different I/O profiles (e.g., sequential vs. random I/O) or different performance needs should be physically isolated in different RAID groups so they do not share disks.

Storage Pools

Storage pools, or simply ‘pools’, are very similar to RAID groups in some ways. Implementation varies by vendor, but generally pools are made up of one or more private RAID groups, which are not visible to the user, or they are composed of user-configured RAID groups which are added manually to the pool. LUNs are then allocated from the pool. Storage pools can contain up to hundreds of drives, often all the drives in an array. As business needs grow, storage pools can be easily scaled up by adding drives or RAID groups and expanding LUN capacity. Storage pools can contain multiple types and sizes of drives and can spread workloads over more drives for a greater degree of parallelism.

Storage pools are usually required for array features like automated storage tiering, where faster SSDs can serve as a data cache among a larger group of HDDs, as well as other array-level data services like compression, deduplication, and thin provisioning. Because of their larger maximum size, storage pools, unlike RAID groups, can take advantage of vSphere 6 maximum LUN sizes of 64TB.

We used two benchmarks to compare the performance of RAID groups and storage pools: VMmark, which is a virtualization platform benchmark, and I/O Analyzer with Iometer, which is a storage microbenchmark.  VMmark is a multi-host virtualization benchmark that uses diverse application workloads as well as common platform level workloads to model the demands of the datacenter. VMs running a complete set of the application workloads are grouped into units of load called tiles. For more details, see the VMmark 2.5 overview. Iometer places high levels of load on the disk, but does not stress any other system resources. Together, these benchmarks give us both a ‘real-world’ and a more focused perspective on storage performance.

VMmark Testing

Array Configuration

Testing was conducted on an EMC VNX5800 block storage SAN with Fibre Channel. This was one of the many storage solutions which offered both RAID group and storage pool technologies. Disks were 200GB single-level cell (SLC) SSDs. Storage configuration followed array best practices, including balancing LUNs across Storage Processors and ensuring that RAID groups and LUNs did not span the array bus. One way to optimize SSD performance is to leave up to 50% of the SSD capacity unutilized, also known as overprovisioning. To follow this best practice, 50% of the RAID group or storage pool was not allocated to any LUN. Since overprovisioning SSDs can be an expensive proposition, we also tested the same configuration with 100% of the storage pool or RAID group allocated.

RAID Group Configuration

Four RAID 5 groups were used, each composed of 15 SSDs. RAID 5 was selected for its suitability for general purpose workloads. RAID 5 provides tolerance against a single disk failure. For best performance and capacity, RAID 5 groups should be sized to multiples of five or nine drives, so this group maintains a multiple of the preferred five-drive count. One LUN was created in each of the four RAID groups. The LUN was sized to either 50% of the RAID group (Best Practices) or 100% (Fully Allocated). For testing, the capacity of each LUN was fully utilized by VMmark virtual machines and randomized data.

            RAID Group Configuration VMmark Storage Comparison        VMmark Storage Pool Configuration Storage Comparison

Storage Pool Configuration

A single RAID 5 Storage Pool containing all 60 SSDs was used. Four thick LUNs were allocated from the pool, meaning that all of the storage space was reserved on the volume. LUNs were equivalent in size and consumed a total of either 50% (Best Practices) or 100% (Fully Allocated) of the pool capacity.

Storage Layout

Most of the VMmark storage load was created by two types of virtual machines: database (DVD Store) and mail server (Microsoft Exchange). These virtual machines were isolated on two different LUNs. The remaining virtual machines were spread across the remaining two LUNs. That is, in the RAID group case, storage-heavy workloads were physically isolated in different RAID groups, but in the storage pool case, all workloads shared the same pool.

Systems Under Test: Two Dell PowerEdge R720 servers
Configuration Per Server:  
     Virtualization Platform: VMware vSphere 6.0. VMs used virtual hardware version 11 and current VMware Tools.
     CPUs: Two 12-core Intel® Xeon® E5-2697 v2 @ 2.7 GHz, Turbo Boost Enabled, up to 3.5 GHz, Hyper-Threading enabled.
     Memory: 256GB ECC DDR3 @ 1866MHz
     Host Bus Adapter: QLogic ISP2532 DualPort 8Gb Fibre Channel to PCI Express
     Network Controller: One Intel 82599EB dual-port 10 Gigabit PCIe Adapter, one Intel I350 Dual-Port Gigabit PCIe Adapter

Each configuration was tested at three different load points: 1 tile (the lowest load level), 7 tiles (an approximate mid-point), and 13 tiles, which was the maximum number of tiles that still met Quality of Service (QoS) requirements. All datapoints represent the mean of two tests of each configuration.

VMmark Results

RAID Group vs. Storage Pool Performance comparison using VMmark benchmark

Across all load levels tested, the VMmark performance score, which is a function of application throughput, was similar regardless of storage provisioning type. Neither the storage type used nor the capacity allocated affected throughput.

VMmark 2.5 performance scores are based on application and infrastructure workload throughput, while application latency reflects Quality of Service. For the Mail Server, Olio, and DVD Store 2 workloads, latency is defined as the application’s response time. We wanted to see how storage configuration affected application latency as opposed to the VMmark score. All latencies are normalized to the lowest 1-tile results.

Storage configuration did not affect VMmark application latencies.

Application Latency in VMmark Storage Comparison RAID Group vs Storage Pool

Lastly, we measured read and write I/O latencies: esxtop Average Guest MilliSec/Write and Average Guest MilliSec/Read. This is the round trip I/O latency as seen by the Guest operating system.

VMmark Storage Latency Storage Comparison RAID Group vs Storage Pool

No differences emerged in I/O latencies.

I/O Analyzer with Iometer Testing

In the second set of experiments, we wanted to see if we would find similar results while testing storage using a synthetic microbenchmark. I/O Analyzer is a tool which uses Iometer to drive load on a Linux-based virtual machine then collates the performance results. The benefit of using a microbenchmark like Iometer is that it places heavy load on just the storage subsystem, ensuring that no other subsystem is the bottleneck.


Testing used a VNX5800 array and RAID 5 level as in the prior configuration, but all storage configurations spanned 9 SSDs, also a preferred drive count. In contrast to the prior test, the storage pool or RAID group spanned an identical number of disks, so that the number of disks per LUN was the same in both configurations. Testing used nine disks per LUN to achieve greater load on each disk.

The LUN was sized to either 50% or 100% of the storage group. The LUN capacity was fully occupied with the I/O Analyzer worker VM and randomized data.  The I/O Analyzer Controller VM, which initiates the benchmark, was located on a separate array and host.

Storage Configuration Iometer with Storage Pool and RAID Group

Testing used one I/O Analyzer worker VM. One Iometer worker thread drove storage load. The size of the VM’s virtual disk determines the size of the active dataset, so a 100GB thick-provisioned virtual disk on VMFS-5 was chosen to maximize I/O to the disk and minimize caching. We tested at a medium load level using a plausible datacenter I/O profile, understanding, however, that any static I/O profile will be a broad generalization of real-life workloads.

Iometer Configuration

  • 1 vCPU, 2GB memory
  • 70% read, 30% write
  • 100% random I/O to model the “I/O blender effect” in a virtualized environment
  • 4KB block size
  • I/O aligned to sector boundaries
  • 64 outstanding I/O
  • 60 minute warm up period, 60 minute measurement period
Systems Under Test: One Dell PowerEdge R720 server
Configuration Per Server:  
     Virtualization Platform: VMware vSphere 6.0. Worker VM used the I/O Analyzer default virtual hardware version 7.
     CPUs: Two 12-core Intel® Xeon® E5-2697 v2 @ 2.7 GHz, Turbo Boost Enabled, up to 3.5 GHz, Hyper-Threading enabled.
     Memory: 256GB ECC DDR3 @ 1866MHz
     Host Bus Adapter: QLogic ISP2532 DualPort 8Gb Fibre Channel to PCI Express

Iometer results

Iometer Latency Results Storage Comparison RAID Group vs Storage PoolIometer Throughput Results Storage Comparison RAID Group vs Storage Pool

In Iometer testing, the storage pool showed slightly improved performance compared to the RAID group, and the amount of capacity allocated also did not affect performance.

In both our multi-workload and synthetic microbenchmark scenarios, we did not observe any performance penalty of choosing storage pools over RAID groups on an all-SSD array, even when disparate workloads shared the same storage pool. We also did not find any performance benefit at the application or I/O level from leaving unallocated capacity, or overprovisioning, SSD RAID groups or storage pools. Given the ease of management and feature-based benefits of storage pools, including automated storage tiering, compression, deduplication, and thin provisioning, storage pools are an excellent choice in today’s datacenters.

vSphere 5.1 IOPS Performance Characterization on Flash-based Storage

At VMworld 2012 we demonstrated a single eight-way VM running on vSphere 5.1 exceeding one million IOPS.  This testing illustrated the high end IOPS performance of vSphere 5.1.

In a new series of tests we have completed some additional characterization of high I/O performance using a very similar environment. The only difference between the 1 million IOPS test environment and the one used for these tests is that the number of Violin Memory Arrays was reduced from two to one (one of the arrays was a short term loan).

Hypervisor: vSphere 5.1
Server: HP DL380 Gen8
CPU: Two Intel Xeon E5-2690, HyperThreading disabled
Memory: 256GB
HBAs: Five QLogic QLE2562
Storage: One Violin Memory 6616 Flash Memory Array
VM: Windows Server 2008 R2, 8 vCPUs and 48GB.
Iometer Configuration: Random, 4KB I/O size with 16 workers

We continued to characterize the performance of vSphere 5.1 and the Violin array across a wider range of configurations and workload conditions.

Based on the types of questions that we often get from customers, we focused on RDM versus VMFS5 comparisons and the usage of various I/O sizes.  In the first series of experiments we compared RDM versus VMFS5 backed datastores using 100% read workload mix while ramping up the I/O size.

click to enlarge

As you can see from the above graph, VMFS5 yielded roughly equivalent performance to that of RDM backed datastores.  Comparing the average of the deltas across all data points showed performance within 1% of RDM for both IOPS and MB/s.  As expected, the number of IOPS decreased after we exceed the default array block size of 4KB, but the throughput continued to scale, approaching 4500 MB/s at both 8KB and 16KB sizes.

For our second series of experiments, we continued to compare RDM versus VMFS5 backed datastores through a progression of block sizes, but this time we altered the workload mix to include 60% reads and 40% writes.

click to enlarge

Violin Memory arrays use a 4KB sector size and perform at their optimal level when managing 4KB blocks. This is very visible in the above IOPS results at the 4KB block size. In the above graph, comparing RDM and VMFS5 IOPS, you can see that VMFS5 performs very well with a 60% read, 40% write mix.  Throughputs continued to scale in a similar fashion as the read-only experimentation and VMFS5 performance for both IOPS and MB/s were within .01% of RDM performance when comparing the average of the deltas across all data points.

The amount of I/O, with just one eight-way VM running on one Violin storage array, is both considerable and sustainable at many I/O sizes.  It’s also noteworthy to point out that running a 60% read and 40% write I/O mix still generated substantial IOPs and bandwidth. While in most cases a single VM won’t need to drive nearly this much I/O traffic, these experiments show that vSphere 5.1 is more than capable of handling it.

1millionIOPS On 1VM

Last year at VMworld 2011 we presented one million I/O operations per second (IOPS) on a single vSphere 5 host (link).  The intent was to demonstrate vSphere 5’s performance by using mutilple VMs to drive an aggregate load of one million IOPS through a single server.   There has recently been some interest in driving similar I/O load through a single VM.  We used a pair of Violin Memory 6616 flash memory arrays, which we connected to a two-socket HP DL380 server, for some quick experiments prior to VMworld.  vSphere 5.1 was able to demonstrate high performance and I/O efficiency by exceeding one million IOPS, doing so with only a modest eight-way VM.  A brief description of our configuration and results is given below.

Hypervisor: vSphere 5.1
Server: HP DL380 Gen8
CPU: 2 x Intel Xeon E5-2690, HyperThreading disabled
Memory: 256GB
HBAs: 5 x QLE2562
Storage: 2 x Violin Memory 6616 Flash Memory Arrays
VM: Windows Server 2008 R2, 8 vCPUs and 48GB.
Iometer Config: 4K IO size w/ 16 workers

Using the above configuration we achieved 1055896 total sustained IOPS.  Check out the following short video clip from one of our latest runs.

Look out for a more thorough write-up after VMworld.