Project Capstone Shows Monster VM Performance

Project Capstone was put together a few weeks before VMworld 2015 with the goal of being able to show what is possible with Monster VMs today. VMware worked with HP and IBM to put together an impressive setup using vSphere 6.0, HP Superdome X and an IBM FlashSystem array that was able to support running four 120 vCPU VMs simultaneously. Putting these massive Virtual Machines under load we found that performance was excellent with great scalability and a high amount of throughput achieved.

vSphere 6 was launched earlier this year and includes support for virtual machines with up to 128 virtual CPUs which is a big increase from the 64 vCPUs supported in vSphere 5.5. “Monster” virtual machines have a new upper limit and it allows for customers to virtualize even the largest of systems with very hungry CPU needs.

The HP Superdome X used for the testing is an impressive system. It has 16 Intel Xeon E7-2890v2 2.8 GHz processors. Each processor has 15 cores and 30 logical threads when Hyper Threading is enabled. In total this is 240 cores / 480 threads.

An IBM FlashSystem array with 20TB of superfast low latency storage was used for the project Capstone configuration. It provided extremely low latency throughout all testing and provided such great performance that storage was never a concern or issue. The FlashSystem was extremely easy to setup and use. Within 24 hours of it arriving in the lab, we were actively running four 120 vCPU VMs with sub millisecond latency.

Large Oracle 12c database virtual machines running on Redhat Enterprise Linux 6.5 were created and configured with 256GB of RAM, pvSCSI virtual disk adapters, and vmxnet3 virtual NICs. The number of VMs and the number of vCPUs for each VM was varied across the tests.

The workload used for the testing was DVD Store 3 (github.com/dvdstore/ds3). DVD Store simulates a real online store with customers logging onto the site, browsing products and product reviews, rating products, and ultimately purchasing those products. The benchmark is measured in Orders Per Minute, with each order representing a complete login, browsing, and purchasing process that includes many individual SQL operations against the database.

This large system with 240 cores / 480 threads, an extremely fast and large storage system, and vSphere 6 showed that even with many monster VMs excellent performance and scalability is possible. Each configuration was first stressed by increasing the DVD Store workload until maximum throughput was achieved for a single virtual machine. In all cases this was found to be at near CPU saturation. The number of VM was then increased so that the entire system was fully committed. A whitepaper to be published soon will have the full set of test results, but here we show the results for four 120 vCPU VMs and sixteen 30 vCPU VMs.

In both cases the performance of the system when fully loaded with either 4 or 16 virtual machines achieves about 90% of perfect linear scalability when compared to the performance of a single virtual machine.

In order to be able to drive the CPU usage to such high levels all disk IO must be very fast so that the system is not waiting for a response. The IBM FlashSystem provided .3 ms average disk latency across all tests. Total disk IO was minimized for these tests to maximize CPU usage and throughput by configuring the database cache size to be equal to the database size. Total disk IO per second (IOPS) peaked at about 50k and averaged 20k while maintaining the extremely low latency during tests.

These test results show that it is possible to use vSphere 6 to successfully virtualize even the largest systems with excellent performance.