Home > Blogs > VMware VROOM! Blog > Author Archives: James Zubb

Author Archives: James Zubb

Using VMmark 3 as a Performance Analysis Tool

VMmark was originally developed to fill the need for a server consolidation benchmark for a rapidly changing datacenter that was becoming increasingly dominated by virtualization.  The design of VMmark, which is a collection of workloads, gives us the ability to quickly change workload parameters to modify the behavior of the entire benchmark. This allows us to use VMmark to exercise technologies that were not available at the time the benchmark was designed. The VMmark 3 run rules provide for academic or research results publication using a modified version of the benchmark.

VMmark 3 was designed in 2015 when the memory size of a typical high-end 2 socket server was 768 GB.  Each VMmark 3 tile was configured to use 156 GB of memory, allowing multiple tiles to be run on each server.  A new technology, Intel Optane DC Persistent Memory, now allows up to 3 TB of memory in a 2 socket server, with plans to increase that even further.  Testing the performance of this technology with an unmodified version of VMmark 3 wouldn’t be easy as we’d saturate CPU resources long before we could fully exercise this large amount of memory.  Thankfully the flexible nature of VMmark allows us to modify it to consume significantly more memory with minimal changes in CPU usage.

The two primary VMmark workloads are Weathervane and DVD Store.  Each can be modified to consume more memory.  Weathervane, as configured for VMmark 3, uses 14 VMs.  Thus while it would be possible to modify this application, doing so would be a time-consuming process.  We therefore decided to look at DVD Store, which uses only four VMs.  Most of the work is done in the DVD Store database VM which was our target for modification.

Determining the best configuration for DVD Store to utilize a larger amount of memory required multiple iterations of testing.   We modified one test parameter of the DVD Store workload, and then examined the results to determine the effect on the VMmark tile. We were looking for larger memory usage with a minimal increase in CPU usage so that we could exercise the larger memory configuration without requiring additional CPUs. The following table lists the default configuration and the variables we changed:

Parameter Default Configurations Tried
VM Memory Size 32 GB 128, 250 and 385 GB
Think Time 1 second 0.5, 0.9, 1.25, and 1.5 seconds
Number of Threads 24 36 and 48
Number of Searches 3 5, 7, and 9
Batch Search Size 3 5, 7, and 9
Database Size 100 GB 300 and 500 GB

The final configuration that we determined to have the most increased memory usage while keeping the CPU usage moderate was 250 GB DS3DB VM memory size, 1.5 seconds think time, and 300 GB database size.  All other parameters were kept at the default.

The following table lists the CPU and memory utilization of the default configuration and the “increased memory” configuration.

Configuration CPU Utilization Memory Utilization
Default 26.3 126 GB
Increased Memory 24.1 350 GB

We were able to almost triple the memory consumption of a single VMmark tile without increasing the CPU usage. Using this “increased memory” configuration for VMmark we can now see the effect of the additional memory provided by Intel Optane DC Persistent Memory in Memory Mode.

More detailed information about this configuration and the methodology used to refine it can be found in the Intel Optane DC Persistent Memory whitepaper.  Detailed instructions to configure VMmark 3 to increase the memory footprint can be obtained by emailing the VMmark team at vmmark-info@vmware.com.  We encourage you to experiment with VMmark under academic rules for your own studies and to let us know if you have any questions.

 

Virtual SAN 6.0 Performance with VMware VMmark

Virtual SAN is a storage solution that is fully integrated with VMware vSphere. Virtual SAN leverages flash technology to cache data and improve its access time to and from the disks. We used VMware’s VMmark 2.5 benchmark to evaluate the performance of running a variety of tier-1 application workloads together on Virtual SAN 6.0.

VMmark is a multi-host virtualization benchmark that uses varied application workloads and common datacenter operations to model the demands of the datacenter. Each VMmark tile contains a set of virtual machines running diverse application workloads as a unit of load. For more details, see the VMmark 2.5 overview.

 

Testing Methodology

VMmark 2.5 requires two datastores for its Storage vMotion workload, but Virtual SAN creates only a single datastore. A Red Hat Enterprise Linux 7 virtual machine was created on a separate host to act as an iSCSI target to serve as the secondary datastore. Linux-IO Target (LIO) was used for this.

 

Configuration

Systems Under Test 8x Supermicro SuperStorage SSG-2027R-AR24 servers
CPUs (per server) 2x Intel Xeon E5-2670 v2 @ 2.50 GHz
Memory (per server) 256 GiB
Hypervisor VMware vSphere 5.5 U2 and vSphere 6.0
Local Storage (per server) 3x 400GB Intel SSDSC2BA4012x 900GB 10,000 RPM WD Xe SAS drives
Benchmarking Software VMware VMmark 2.5.2

 

Workload Characteristics

Storage performance is often measured in IOPS, or I/Os per second. Virtual SAN is a storage technology, so it is worthwhile to look at how many IOPS VMmark is generating.  The most disk-intensive workloads within VMmark are DVD Store 2 (also known as DS2), an E-Commerce workload, and the Microsoft Exchange 2007 mail server workload. The graphs below show the I/O profiles for these workloads, which would be identical regardless of storage type.

 Figure1

The DS2 database virtual machine shows a fairly balanced I/O profile of approximately 55% reads and 45% writes.

Microsoft Exchange, on the other hand, has a very write-intensive load, as shown below.

Figure2

Exchange sees nearly 95% writes, so the main benefit the SSDs provide is to serve as a write buffer.

The remaining application workloads have minimal disk I/Os, but do exert CPU and networking loads on the system.

 

Results

VMmark measures both the total throughput of each workload as well as the response time.  The application workloads consist of Exchange, Olio (a Java workload that simulates Web 2.0 applications and measures their performance), and DVD Store 2. All workloads are driven at a fixed throughput level.  A set of workloads is considered a tile.  The load is increased by running multiple tiles.  With Virtual SAN 6.0, we could run up to 40 tiles with acceptable quality of service (QoS). Let’s look at how each workload performed with increasing the number of tiles.

DVD Store

There are 3 webserver frontends per DVD Store tile in VMmark.  Each webserver is loaded with a different profile.  One is a steady-state workload, which runs at a set request rate throughout the test, while the other two are bursty in nature and run a 3-minute and 4-minute load profile every 5 minutes.  DVD Store throughput, measured in orders per minute, varies depending on the load of the server. The throughput will decrease once the server becomes saturated.

Figure3

For this configuration, maximum throughput was achieved at 34 tiles, as shown by the graph above.  As the hosts become saturated, the throughput of each DVD Store tile falls, resulting in a total throughput decrease of 4% at 36 tiles. However, the benchmark still passes QoS at 40 tiles.

Olio and Exchange

Unlike DVD Store, the Olio and Exchange workloads operate at a constant throughput regardless of server load, shown in the table below:

Workload Simulated Users Load per Tile
Exchange 1000 320-330 Sendmail actions per minute
Olio 400 4500-4600 operations per minute

 

At 40 tiles the VMmark clients are sending over ~12,000 mail messages per minute and the Olio webservers served ~180,000 requests per minute.

As the load increases, the response time of Exchange and Olio increases, which makes them a good demonstration of the end-user experience at various load levels. A response time of over 500 milliseconds is considered to be an unacceptable user experience.

Figure4

As we saw with DVD Store, performance begins to dramatically change after 34 tiles as the cluster becomes saturated.  This is mostly seen in the Exchange response time.  At 40 tiles, the response time is over 300 milliseconds for the mailserver workload, which is still within the 500 millisecond threshold for a good user experience. Olio has a smaller increase in response time, since it is more processor intensive.  Exchange has a dependence on both CPU and disk performance.

Looking at Virtual SAN performance, we can get a picture of how much I/O is served by the storage at these load levels.  We can see that reads average around 2000 read I/Os per second:

Figure5

The Read Cache hit rate is 98-99% on all the hosts, so most of these reads are being serviced by the SSDs. Write performance is a bit more varied.

Figure6

We see a range of 5,000-10,000 write IOPS per node due to the write-intensive Exchange workload. Storage is nowhere close to saturation at these load levels. The magnetic disks are not seeing much more than 100 I/Os per second, while the SSDs are seeing about 3,000 – 6,000 I/Os per second. These disks should be able to handle at least 10x this load level. The real bottleneck is in CPU usage.

Looking at the CPU usage of the cluster, we can see that the usage levels out at 36 tiles at about 84% used.  There is still some headroom, which explains why the Olio response times are still very acceptable.

Figure7

As mentioned above, Exchange performance is dependent on both CPU and storage. The additional CPU requirements that Virtual SAN imposes on disk I/O causes Exchange to be more sensitive to server load.

 

Performance Improvements in Virtual SAN 6.0 (vs. Virtual SAN 5.5)

The Virtual SAN 6.0 release incorporates many improvements to CPU efficiency, as well as other improvements. This translates to increased performance for VMmark.

VMmark performance increased substantially when we ran the tests with Virtual SAN 6.0 as opposed to Virtual SAN 5.5. The Virtual SAN 5.5 tests failed to pass QoS beyond 30 tiles, meaning that at least one workload failed to meet the application latency requirement.  During the Virtual SAN 5.5 32-tile tests, one or more Exchange clients would report a Sendmail latency of over 500ms, which is determined to be a QoS failure.  Version 6.0 was able to achieve passing QoS at up to 40 tiles.

Figure8

Not only were more virtual machines able to be supported on Virtual SAN 6.0, but the throughput of the workloads increased as well.  By comparing the VMmark score (normalized to 20-tile Virtual SAN 5.5 results) we can see the performance improvement of Virtual SAN 6.0.

Figure9

Virtual SAN 6.0 achieved a performance improvement of 24% while supporting 33% more virtual machines.

 

Conclusion

Using VMmark, we are able to run a variety of workloads to simulate applications in a production environment.  We were able to demonstrate that Virtual SAN is capable of achieving good performance running heterogeneous real world applications.  The cluster of 8 hosts presented here show good performance in VMmark through 40 tiles.  This is ~12,000 mail messages per minute sent through Exchange, ~180,000 requests per minute served by the Olio webservers, and over 200,000 orders per minute processed on the DVD Store database.  Additionally, we were able to measure substantial performance improvements over Virtual SAN 5.5 using Virtual SAN 6.0.

 

Reducing Power Consumption in the vSphere 5.5 Datacenter

Today’s virtualized datacenters consist of several servers connected to shared storage, and this configuration has been necessary to enable the flexibility that virtualization provides and still allow for high performance. However, the power consumption of this setup is a major concern because shared storage can consume as much as 2-3x the power of a single, mid-ranged server. In this blog, we look at the performance impact of replacing shared storage with local disks and PCIe flash storage in a vSphere 5.5 datacenter to save power.

We leverage two innovative vSphere features in this performance test:

  • Unified live migration, first introduced with vSphere 5.1, removes the shared storage requirement for vMotion and allows combining traditional vMotion and Storage vMotion into one operation. This combined live migration copies both the virtual machine’s memory and storageover the network to the destination vSphere host. This feature offers administrators significantly more simplicity and flexibility in managing and moving virtual machines across their virtual infrastructures compared to the traditional vMotion and Storage vMotion migration solutions. More information about vMotion can be found in the VMware vSphere 5.1 vMotion Architecture, Performance, and Best Practices white paper.
  • vSphere 5.5 improves server power management by enabling processor C-states, in addition to the previously-used P-states, to improve power savings in the Balanced policy setting. More information about these improvements can be found in the Host Power Management in vSphere 5.5 white paper.

We measure the performance and power savings of these features when replacing shared storage with local disks and PCIe flash storage using a modified version of VMware VMmark 2.5. VMmark is a multi-host virtualization benchmark that uses varied application workloads, as well as common datacenter operations to model the demands of the datacenter. Each VMmark tile contains a set of VMs running diverse application workloads as a unit of load. For more details, see the VMmark 2.5 overview. The benchmark was modified to replace the traditional vMotion workload component with the new shared-nothing, unified live migration.

Testing Methodology

VMmark 2.5 was modified to convert the vMotion workload into a migration without shared storage. All other workloads were unchanged. This allowed a comparison of local, direct attached storage to a traditional Fibre Channel SAN. We measured the power consumption of each configuration using a pair of Yokogawa WT210 power meters, one attached to the servers and the other attached to the external storage.

Configuration

  • Systems Under Test: 2x Dell PowerEdge R710 servers
  • CPUs (per server): 2x Intel Xeon X5670 @ 2.93 GHz
  • Memory (per server): 96 GiB
  • Hypervisor: VMware vSphere 5.5
  • Local Storage (per server): 1x 785GB Fusion-io ioDrive2, 2x 300GB 10K RPM SAS drives in RAID 0
  • SAN: 8Gb Fibre Channel, 30x 200GB SATA Flash drives, 30x 600GB 15K RPM SAS drives
  • Benchmarking software: VMware VMmark 2.5

All I/O-intensive virtual disks were stored on the Fusion-io devices for local storage tests or the SATA flash drives for the SAN tests.  This included the DVD Store database files, the mail server database, and the Olio database.  All remaining virtual machine data was stored on the local SAS drives for the local storage tests and the SAN SAS drives for the SAN tests.

Results
 
VMmark performance using shared-nothing, unified live migration backed by fast local storage showed only minor differences compared to the results with shared storage.  The largest variance was seen in the infrastructure operations, which was expected as the vMotion workload was modified to include a storage migration.  The chart below shows the scores normalized to the 3-tile SAN test results.

scores

When we add the power data to these results, and compare the Performance Per Killowatt (PPKW), we see a much different picture.  The local storage-based PPKW score is much higher than shared storage due to higher power efficiency.

ppkw

We can see the reason for this difference is due to the power consumption of each configuration.  The SAN is consuming over 1000 watts, which is typical of this storage solution.  Replacing that power-hungry component with local storage greatly reduces vSphere datacenter power consumption while maintaining good performance.

power

This SAN should be able to support approximately 25 VMmark tiles (based on the storage capacity of the SSDs), roughly five times the load being supported by the two servers we had available for testing in our lab. However, it should be noted that these servers are two generations old. Current-generation two-socket servers with a comparable power usage can support 2-3x the number of tiles based on published VMmark results. This would imply that the SAN could support at most four current-generation servers. While an additional two servers will further amortize the power cost of the SAN, significant power savings would still be achieved with an all-local storage architecture.

This is not without a cost.  Removing shared storage reduces the functionality of the datacenter because there are a number of vSphere features which will no longer function, such as DRS and traditional vMotion. The reduction in the infrastructure performance due to no shared storage will limit the workloads that can be run in this manner to virtual machines with smaller disks which can be moved between hosts without shared storage fairly quickly. Virtual machines with large disks would take much longer to move and would be better suited to a shared storage environment.

We have shown that it is possible to significantly reduce datacenter power consumption without significantly reducing performance by replacing shared storage with local storage solutions.  Unified live migration enables the use of local storage without a significant infrastructure performance penalty while maintaining application performance comparable to traditional environments using shared storage for the server workloads represented in VMmark.  The resulting elimination of shared storage creates significant power savings and lower operations costs.