Home > Blogs > VMware VROOM! Blog > Tag Archives: ESXi

Tag Archives: ESXi

SQL Server Performance of VMware Cloud on AWS

In the past, I’ve always benchmarked performance of SQL Server VMs on vSphere with “on-premises” infrastructure.  Given the skyrocketing interest in the cloud, I was very excited to get my hands on VMware Cloud on AWS – just in time for Amazon’s AWS Summit!

A key question our customers have is: how well do applications (like SQL Server) perform in our cloud?  Well, I’m happy to report that the answer is great!

VMware Cloud on AWS Environment

First, here is a screenshot of what my vSphere-powered Software-Defined Data Center (SDDC) looks like:vSphere Client - VMware Cloud on AWSThis screenshot shows several notable items:

  • The HTML5-based vSphere Client interface should be very familiar to vSphere administrators, making the move to the cloud extremely easy
  • This SDDC instance was auto-provisioned with 4 ESXi hosts and 2TB of memory, all of which were pre-configured with vSAN storage and NSX networking.
    • Each host is configured with two CPUs (Intel Xeon Processor E5-2686 v4); each socket contains 18 cores running at 2.3GHz, resulting in 144 physical cores in the cluster. For more information, see the VMware Cloud on AWS Technical Overview
  • Virtual machines are provisioned within the customer workload resource pool, and vSphere DRS automatically handles balancing the VMs across the compute cluster.

Benchmark Methodology

To measure SQL Server database performance, I used HammerDB, an open-source database load testing and benchmarking tool.  It implements a TPC-C like workload, and reports throughput in TPM (Transactions Per Minute).

To measure how well performance scaled in this cloud, I started with a single 8 vCPU, 32GB RAM VM for the SQL Server database.  To drive the workload, I created a 4 vCPU, 4GB RAM HammerDB driver VM.  I then cloned these VMs to measure 2 database VMs being driven simultaneously:HammerDB and SQL Server VMs in VMware Cloud on AWS

I then doubled the number of VMs again to 4, 8, and finally 16.  As with any benchmark, these VMs were completely driven up to saturation (100% load) – “pedal to the metal”!


So, how did the results look?  Well, here is a graph of each VM count and the resulting database performance:

As you can see, database performance scaled great; when running 16 8-vCPU VMs, VMware Cloud on AWS was able to sustain 6.7 million database TPM!

I’ll be detailing these benchmarks more in an upcoming whitepaper, but wanted to share these results right away.  If you have any questions or feedback, please leave me a comment!

Performance of SQL Server 2017 for Linux VMs on vSphere 6.5

Microsoft SQL Server has long been one of the most popular applications for running on vSphere virtual machines.  Last year there was quite a bit of excitement when Microsoft announced they were bringing SQL Server to Linux.  Over the last year Microsoft has had quite a bit of interest in SQL Server for Linux and it was announced at Microsoft Ignite last month that it is now officially launched and generally available.

VMware and Microsoft have collaborated to validate and support the functionality and performance scalability of SQL Server 2017 on vSphere-based Linux VMs.  The results of that work show SQL Server 2017 for Linux installs easily and has great performance within VMware vSphere virtual machines. VMware vSphere is a great environment to be able to try out the new Linux version of SQL Server and be able to also get great performance.

Using CDB, a cloud database benchmark developed by the Microsoft SQL Server team, we were able to verify that the performance of SQL Server for Linux in a vSphere virtual machine was similar to other non-virtualized and virtualized operating systems or platforms.

Our initial reference test size was relatively small, so we wanted to try out testing larger sizes to see how well SQL Server 2017 for Linux performed as the VM size was scaled up.  For the test, we used a four socket Intel Xeon E7-8890 v4 (Broadwell)-based server with 96 cores (24 cores per socket).  The initial test began with a 24 virtual CPU VM to match the number of physical cores of a single socket.  Additional tests were run by increasing the size of the VM by 24 vCPUs for each test until, in the final test, the VM had 96 total vCPUs.  We configured the virtual machine with 512 GB of RAM and separate log and data disks on an SSD-based Fibre Channel SAN.  We used the same best practices for SQL Server for Linux as what we normally use for the windows version as documented in our published best practices guide for SQL Server on vSphere.

The results showed that SQL Server 2017 for Linux scaled very well as the additional vCPUs were added to the virtual machine. SQL Server 2017 for Linux is capable of scaling up to handle very large databases on VMware vSphere 6.5 Linux virtual machines.

Skylake Update – Oracle Database Performance on vSphere 6.5 Monster Virtual Machines

We were able to get one of the new four-socket Intel Skylake based servers and run some more tests. Specifically we used the Xeon Platinum 8180 processors with 28 cores each. The new data has been added to the Oracle Monster Virtual Machine Performance on VMware vSphere 6.5 whitepaper. Please check out the paper for the full details and context of these updates.

The generational testing in the paper now includes a fifth generation with a 112 vCPU virtual machine running on the Skylake based server. Performance gain from the initial 40 vCPU VM on Westmere-EX to the Skylake based 112 vCPU VM is almost 4x.

The performance gained from Hyper-Threading was also updated and shows a 27% performance gain from the use of Hyper-Threads. The test was conducted by running two 112 vCPU VMs at the same time so that all 224 logical threads are active. The total throughput from the two VMs is then compared with the throughput from a single VM.

My colleague David Morse has also updated his SQL Server monster virtual machine whitepaper with Skylake data as well.

Updated – SQL Server VM Performance with vSphere 6.5, October 2017

Back in March, I published a performance study of SQL Server performance with vSphere 6.5 across multiple processor generations.  Since then, Intel has released a brand-new processor architecture: the Xeon Scalable platform, formerly known as Skylake.

Our team was fortunate enough to get early access to a server with these new processors inside – just in time for generating data that we presented to customers at VMworld 2017.

Each Xeon Platinum 8180 processor has 28 physical cores (pCores), and with four processors in the server, there was a whopping 112 pCores on one physical host!  As you can see, that extra horsepower provides nice database server performance scaling:

Generational SQL Server VM Database Performance

Generational SQL Server VM Database Performance

For more details and the test results, take a look at the updated paper:
Performance Characterization of Microsoft SQL Server on VMware vSphere 6.5

Oracle Database Performance on vSphere 6.5 Monster Virtual Machines

We have just published a new whitepaper on the performance of Oracle databases on vSphere 6.5 monster virtual machines. We took a look at the performance of the largest virtual machines possible on the previous four generations of four-socket Intel-based servers. The results show how performance of these large virtual machines continues to scale with the increases and improvements in server hardware.

Oracle Database Monster VM Performance across 4 generations of Intel based servers on vSphere 6.5

Oracle Database Monster VM Performance on vSphere 6.5 across 4 generations of Intel-based  four-socket servers

In addition to vSphere 6.5 and the four-socket Intel-based servers used in the testing, an IBM FlashSystem A9000 high performance all flash array was used. This array provided extreme low latency performance that enabled the database virtual machines to perform at the achieved high levels of performance.

Please read the full paper, Oracle Monster Virtual Machine Performance on VMware vSphere 6.5, for details on hardware, software, test setup, results, and more cool graphs.  The paper also covers performance gain from Hyper-Threading, performance effect of NUMA, and best practices for Oracle monster virtual machines. These best practices are focused on monster virtual machines, and it is recommended to also check out the full Oracle Databases on VMware Best Practices Guide.

Some similar tests with Microsoft SQL Server monster virtual machines were also recently completed on vSphere 6.5 by my colleague David Morse. Please see his blog post  and whitepaper for the full details.

This work on Oracle is in some ways a follow up to Project Capstone from 2015 and the resulting whitepaper Peeking at the Future with Giant Monster Virtual Machines . That project dealt with monster VM performance from a slightly different angle and might be interesting to those who are also interested in this paper and its results.


SQL Server VM Performance with VMware vSphere 6.5

Achieving optimal SQL Server performance on vSphere has been a constant focus here at VMware; I’ve published past performance studies with vSphere 5.5 and 6.0 which showed excellent performance up to the maximum VM size supported at the time.

Since then, there have been quite a few changes!  While this study uses a similar test methodology, it features an updated hypervisor (vSphere 6.5), database engine (SQL Server 2016), OLTP benchmark (DVD Store 3), and CPUs (Intel Xeon v4 processors with 24 cores per socket, codenamed Broadwell-EX).

Continue reading

VMware vCloud Air Database Performance Scalability with SQL Server

Previous posts have shown vSphere can easily handle running Microsoft SQL Server on four-socket servers with large numbers of cores—with vSphere 5.5 on Westmere-EX and more recently with vSphere 6 on Ivy Bridge-EX.  We recently ran similar tests on vCloud Air to measure how these enterprise databases with mission critical performance requirements perform in a cloud environment. The tests show that SQL Server databases scale very well on vCloud Air with a variety of virtual machine (VM) counts and virtual CPU (vCPU) sizes.

The benchmark tests were run with vCloud Air using their Virtual Private Cloud (VPC) subscription-based service.  This is a very compelling hybrid cloud service that allows for an on-premises vSphere infrastructure to be expanded into the public cloud in a secure and scalable way. The underlying host hardware consisted of two 8-core CPUs for a total of 16 physical cores, which meant that the maximum number of vCPUs was 16 (although additional processors were available via Hyper-Threading, they were not utilized).

Continue reading

Virtualized Storage Performance: RAID Groups versus Storage pools

RAID, a redundant array of independent disks, has traditionally been the foundation of enterprise storage. Grouping multiple disks into one logical unit can vastly increase the availability and performance of storage by protecting against disk failure, allowing greater I/O parallelism, and pooling capacity. Storage pools similarly increase the capacity and performance of storage, but are easier to configure and manage than RAID groups.

RAID groups have traditionally been regarded as offering better and more predictable performance than storage pools. Although both technologies were developed for magnetic hard disk drives (HDDs), solid-state drives (SSDs), which use flash memory, have become prevalent. Virtualized environments are also common and tend to create highly randomized I/O given the fact that multiple workloads are run simultaneously.

We set out to see how the performance of RAID group and storage pool provisioning methods compare in today’s virtualized environments.

First, let’s take a closer look at each storage provisioning type.

RAID Groups

A RAID group unifies a number of disks into one logical unit and distributes data across multiple drives. RAID groups can be configured with a particular protection level depending on the performance, capacity, and redundancy needs of the environment. LUNs are then allocated from the RAID group. RAID groups typically contain only identical drives, and the maximum number of disks in a RAID group varies by system model but is generally below fifty. Because drives typically have well defined performance characteristics, the overall RAID group performance can be calculated as the performance of all drives in the group minus the RAID overhead. To provide consistent performance, workloads with different I/O profiles (e.g., sequential vs. random I/O) or different performance needs should be physically isolated in different RAID groups so they do not share disks.

Storage Pools

Storage pools, or simply ‘pools’, are very similar to RAID groups in some ways. Implementation varies by vendor, but generally pools are made up of one or more private RAID groups, which are not visible to the user, or they are composed of user-configured RAID groups which are added manually to the pool. LUNs are then allocated from the pool. Storage pools can contain up to hundreds of drives, often all the drives in an array. As business needs grow, storage pools can be easily scaled up by adding drives or RAID groups and expanding LUN capacity. Storage pools can contain multiple types and sizes of drives and can spread workloads over more drives for a greater degree of parallelism.

Storage pools are usually required for array features like automated storage tiering, where faster SSDs can serve as a data cache among a larger group of HDDs, as well as other array-level data services like compression, deduplication, and thin provisioning. Because of their larger maximum size, storage pools, unlike RAID groups, can take advantage of vSphere 6 maximum LUN sizes of 64TB.

We used two benchmarks to compare the performance of RAID groups and storage pools: VMmark, which is a virtualization platform benchmark, and I/O Analyzer with Iometer, which is a storage microbenchmark.  VMmark is a multi-host virtualization benchmark that uses diverse application workloads as well as common platform level workloads to model the demands of the datacenter. VMs running a complete set of the application workloads are grouped into units of load called tiles. For more details, see the VMmark 2.5 overview. Iometer places high levels of load on the disk, but does not stress any other system resources. Together, these benchmarks give us both a ‘real-world’ and a more focused perspective on storage performance.

VMmark Testing

Array Configuration

Testing was conducted on an EMC VNX5800 block storage SAN with Fibre Channel. This was one of the many storage solutions which offered both RAID group and storage pool technologies. Disks were 200GB single-level cell (SLC) SSDs. Storage configuration followed array best practices, including balancing LUNs across Storage Processors and ensuring that RAID groups and LUNs did not span the array bus. One way to optimize SSD performance is to leave up to 50% of the SSD capacity unutilized, also known as overprovisioning. To follow this best practice, 50% of the RAID group or storage pool was not allocated to any LUN. Since overprovisioning SSDs can be an expensive proposition, we also tested the same configuration with 100% of the storage pool or RAID group allocated.

RAID Group Configuration

Four RAID 5 groups were used, each composed of 15 SSDs. RAID 5 was selected for its suitability for general purpose workloads. RAID 5 provides tolerance against a single disk failure. For best performance and capacity, RAID 5 groups should be sized to multiples of five or nine drives, so this group maintains a multiple of the preferred five-drive count. One LUN was created in each of the four RAID groups. The LUN was sized to either 50% of the RAID group (Best Practices) or 100% (Fully Allocated). For testing, the capacity of each LUN was fully utilized by VMmark virtual machines and randomized data.

RAID Group Configuration VMmark Storage Comparison        VMmark Storage Pool Configuration Storage Comparison

Storage Pool Configuration

A single RAID 5 Storage Pool containing all 60 SSDs was used. Four thick LUNs were allocated from the pool, meaning that all of the storage space was reserved on the volume. LUNs were equivalent in size and consumed a total of either 50% (Best Practices) or 100% (Fully Allocated) of the pool capacity.

Storage Layout

Most of the VMmark storage load was created by two types of virtual machines: database (DVD Store) and mail server (Microsoft Exchange). These virtual machines were isolated on two different LUNs. The remaining virtual machines were spread across the remaining two LUNs. That is, in the RAID group case, storage-heavy workloads were physically isolated in different RAID groups, but in the storage pool case, all workloads shared the same pool.

Systems Under Test: Two Dell PowerEdge R720 servers
Configuration Per Server:  
     Virtualization Platform: VMware vSphere 6.0. VMs used virtual hardware version 11 and current VMware Tools.
     CPUs: Two 12-core Intel® Xeon® E5-2697 v2 @ 2.7 GHz, Turbo Boost Enabled, up to 3.5 GHz, Hyper-Threading enabled.
     Memory: 256GB ECC DDR3 @ 1866MHz
     Host Bus Adapter: QLogic ISP2532 DualPort 8Gb Fibre Channel to PCI Express
     Network Controller: One Intel 82599EB dual-port 10 Gigabit PCIe Adapter, one Intel I350 Dual-Port Gigabit PCIe Adapter

Each configuration was tested at three different load points: 1 tile (the lowest load level), 7 tiles (an approximate mid-point), and 13 tiles, which was the maximum number of tiles that still met Quality of Service (QoS) requirements. All datapoints represent the mean of two tests of each configuration.

VMmark Results

RAID Group vs. Storage Pool Performance comparison using VMmark benchmark

Across all load levels tested, the VMmark performance score, which is a function of application throughput, was similar regardless of storage provisioning type. Neither the storage type used nor the capacity allocated affected throughput.

VMmark 2.5 performance scores are based on application and infrastructure workload throughput, while application latency reflects Quality of Service. For the Mail Server, Olio, and DVD Store 2 workloads, latency is defined as the application’s response time. We wanted to see how storage configuration affected application latency as opposed to the VMmark score. All latencies are normalized to the lowest 1-tile results.

Storage configuration did not affect VMmark application latencies.

Application Latency in VMmark Storage Comparison RAID Group vs Storage Pool

Lastly, we measured read and write I/O latencies: esxtop Average Guest MilliSec/Write and Average Guest MilliSec/Read. This is the round trip I/O latency as seen by the Guest operating system.

VMmark Storage Latency Storage Comparison RAID Group vs Storage Pool

No differences emerged in I/O latencies.

I/O Analyzer with Iometer Testing

In the second set of experiments, we wanted to see if we would find similar results while testing storage using a synthetic microbenchmark. I/O Analyzer is a tool which uses Iometer to drive load on a Linux-based virtual machine then collates the performance results. The benefit of using a microbenchmark like Iometer is that it places heavy load on just the storage subsystem, ensuring that no other subsystem is the bottleneck.


Testing used a VNX5800 array and RAID 5 level as in the prior configuration, but all storage configurations spanned 9 SSDs, also a preferred drive count. In contrast to the prior test, the storage pool or RAID group spanned an identical number of disks, so that the number of disks per LUN was the same in both configurations. Testing used nine disks per LUN to achieve greater load on each disk.

The LUN was sized to either 50% or 100% of the storage group. The LUN capacity was fully occupied with the I/O Analyzer worker VM and randomized data.  The I/O Analyzer Controller VM, which initiates the benchmark, was located on a separate array and host.

Storage Configuration Iometer with Storage Pool and RAID Group

Testing used one I/O Analyzer worker VM. One Iometer worker thread drove storage load. The size of the VM’s virtual disk determines the size of the active dataset, so a 100GB thick-provisioned virtual disk on VMFS-5 was chosen to maximize I/O to the disk and minimize caching. We tested at a medium load level using a plausible datacenter I/O profile, understanding, however, that any static I/O profile will be a broad generalization of real-life workloads.

Iometer Configuration

  • 1 vCPU, 2GB memory
  • 70% read, 30% write
  • 100% random I/O to model the “I/O blender effect” in a virtualized environment
  • 4KB block size
  • I/O aligned to sector boundaries
  • 64 outstanding I/O
  • 60 minute warm up period, 60 minute measurement period
Systems Under Test: One Dell PowerEdge R720 server
Configuration Per Server:  
     Virtualization Platform: VMware vSphere 6.0. Worker VM used the I/O Analyzer default virtual hardware version 7.
     CPUs: Two 12-core Intel® Xeon® E5-2697 v2 @ 2.7 GHz, Turbo Boost Enabled, up to 3.5 GHz, Hyper-Threading enabled.
     Memory: 256GB ECC DDR3 @ 1866MHz
     Host Bus Adapter: QLogic ISP2532 DualPort 8Gb Fibre Channel to PCI Express

Iometer results

Iometer Latency Results Storage Comparison RAID Group vs Storage PoolIometer Throughput Results Storage Comparison RAID Group vs Storage Pool

In Iometer testing, the storage pool showed slightly improved performance compared to the RAID group, and the amount of capacity allocated also did not affect performance.

In both our multi-workload and synthetic microbenchmark scenarios, we did not observe any performance penalty of choosing storage pools over RAID groups on an all-SSD array, even when disparate workloads shared the same storage pool. We also did not find any performance benefit at the application or I/O level from leaving unallocated capacity, or overprovisioning, SSD RAID groups or storage pools. Given the ease of management and feature-based benefits of storage pools, including automated storage tiering, compression, deduplication, and thin provisioning, storage pools are an excellent choice in today’s datacenters.

SQL Server VM Performance on VMware vSphere 6

Last October, I blogged about SQL Server performance with vSphere 5.5 using a four-socket Intel Xeon processor E7 based host.  Now that vSphere 6 is available, I’ve run an updated set of tests using this new release, on an even more powerful host, with Xeon E7 v2 processors.  A variety of virtual CPU (vCPU) and virtual machine (VM) quantities were tested to show that vSphere can handle hundreds of thousands of online transaction processing (OLTP) database operations per minute.

DVD Store 2.1, an open-source OLTP database stress tool, was the workload used to stress the VMs.  The first experiment in the paper was a generational performance comparison between the old and new setups; as you can see, there is a dramatic increase in throughput, even though the size of each VM has doubled from 8 vCPUs per VM to 16:

Generational performance improvement from old study to new study

There are also tests using CPU affinity to show the performance differences between physical cores and logical processors (Hyper-Threads), the benefit of “right-sizing” virtual machines, and measuring the impact of the advanced Latency Sensitivity setting. 

For more details and the test results, please download the whitepaper: Performance Characterization of Microsoft SQL Server on VMware vSphere 6.

VMware vSphere 6 and Oracle 12c Scalability Study: Scaling Monster Virtual Machines

vSphere 6 introduces the ability to run virtual machines (VMs) with up to 128 virtual CPUs (vCPUs) and 4TB of RAM. This doubles the number of vCPUs supported from the previous version and increases the amount of RAM by four times. This new capability provides the potential for customers to run larger workloads than ever before in a virtual machine.

A series of tests were run with a virtual machine hosting Oracle 12c database instances. The DVD Store 2.1 open-source transactional workload was used to measure the performance of a large “Monster” VM on vSphere 6. The Oracle 12c database VM was scaled from 15 vCPUs all the way up to 120 vCPUs, and the maximum achieved throughput was measured. The full results and test details have been published in a white paper – VMware vSphere 6 and Oracle 12c Scalability Study: Scaling Monster Virtual Machines.

A four-socket Intel Xeon E7-4890 v2 processor based server with 1TB of memory was used to host the virtual machine for the tests.  Each Xeon E7-4890 v2 processor has 15 cores / 30 threads with Hyper Threading enabled for a total of 60 cores / 120 threads for the system. The diagram below shows the basic test configuration.



In all tests Hyper-Threading was enabled on the server, but in configurations where 60 vCPUs or less are assigned to the VM, Hyper-Threads are not used by the VM. This is a result of the default scheduling policy where the preference is for vCPUs to be scheduled on one thread per core before using the second thread of any core. This first set of results, shown below, is focused on the tests that scale up to 60 vCPUs. These tests show the scaling for the virtual machine without the use of Hyper-Threads


While vSphere 6 supports up to 128 vCPUs per VM, these tests were limited to 120 vCPUs due to the number of threads available on the server. The largest VM configuration used both hardware execution threads (Hyper-Threads) on all the processor cores in order to reach 120 vCPUs. In this case, there is one vCPU per execution thread.

Hyper-Threading doubles the number of execution threads, but it does not double performance. In order to measure the scale-up performance of the 120-vCPU VM, a 60-vCPU VM was configured with CPU affinity so that it was limited to only two of the server’s four sockets. In this configuration the 60-vCPU VM has one vCPU per execution thread, which is the same as the 120-vCPU VM.  Configuring a 60-vCPU VM in this way makes it easy to see the scale up performance at 120 vCPUs on this server with hyper-threads enabled.

The results of the scale-up testing using the 60-vCPU VM configured with CPU affinity to only 2 sockets and the 120-vCPU VM using all four sockets showed approximately linear scaling, as shown in the graph below.


For full test details and more test results please see the white paper that has was recently published.

The new larger “Monster” VM support in vSphere 6 allows for virtual machines that can support larger workloads than ever before with excellent performance. These tests show that large virtual machines running on vSphere 6 can scale up as needed to meet extreme performance demands.