We were able to get one of the new four-socket Intel Skylake based servers and run some more tests. Specifically we used the Xeon Platinum 8180 processors with 28 cores each. The new data has been added to the Oracle Monster Virtual Machine Performance on VMware vSphere 6.5 whitepaper. Please check out the paper for the full details and context of these updates.
The generational testing in the paper now includes a fifth generation with a 112 vCPU virtual machine running on the Skylake based server. Performance gain from the initial 40 vCPU VM on Westmere-EX to the Skylake based 112 vCPU VM is almost 4x.
The performance gained from Hyper-Threading was also updated and shows a 27% performance gain from the use of Hyper-Threads. The test was conducted by running two 112 vCPU VMs at the same time so that all 224 logical threads are active. The total throughput from the two VMs is then compared with the throughput from a single VM.
My colleague David Morse has also updated his SQL Server monster virtual machine whitepaper with Skylake data as well.
We have just published a new whitepaper on the performance of Oracle databases on vSphere 6.5 monster virtual machines. We took a look at the performance of the largest virtual machines possible on the previous four generations of four-socket Intel-based servers. The results show how performance of these large virtual machines continues to scale with the increases and improvements in server hardware.
Oracle Database Monster VM Performance on vSphere 6.5 across 4 generations of Intel-based four-socket servers
In addition to vSphere 6.5 and the four-socket Intel-based servers used in the testing, an IBM FlashSystem A9000 high performance all flash array was used. This array provided extreme low latency performance that enabled the database virtual machines to perform at the achieved high levels of performance.
Some similar tests with Microsoft SQL Server monster virtual machines were also recently completed on vSphere 6.5 by my colleague David Morse. Please see his blog post and whitepaper for the full details.
Remember that cool project with VMware, HP Enterprise, and IBM where four super huge monster virtual machines (VMs) of 120 vCPUs each were all running at the same time on a single server with great performance?
In addition to the four 120 vCPU VMs test, additional configurations were also run with eight 60 vCPU VMs and sixteen 30 vCPU VMs. This shows that plenty of large VMs can be run on a single host with excellent performance when using a solution that supports tons of CPU capacity and cutting edge flash storage.
The whitepaper not only contains all of the test results from the original presentation, but also includes additional details around the performance of CPU Affinity vs PreferHT and under-provisioning. There is also a best practices section that if focused on running monster VMs.
Performance studies have previously shown that there is no doubt virtualized servers can run a variety of applications near, or in some cases even above, that of software running natively (on bare metal). In a new white paper, we raise the bar higher and test “monster” vSphere virtual machines loaded with CPU and running the most taxing databases and transaction processing applications.
The benchmark workload, which we call Order-Entry, is based on an industry-standard online transaction processing (OLTP) benchmark called TPC-C. Both rigorous and demanding, the Order-Entry workload pushes virtual machine performance.
Note: The Order Entry benchmark is derived from the TPC-C workload, but is not compliant with the TPC-C specification, and its results are not comparable to TPC-C results.
The white paper quantifies the:
Performance differential between ESXi 6.0 and native
Performance differential between ESXi 6.0 and ESXi 5.1
Performance gains due to enhancements built into ESXi 6.0
Oracle Real Application Clusters (RAC) is used to run critical databases with stringent performance requirements. A series of tests recently were run in the VMware performance lab to determine how an Oracle RAC database performs when running on vSphere. The test results showed that the application performed within 11 to 13 percent of physical when running in a virtualized environment.
Two servers were used for both physical and virtual tests. Two Dell PowerEdge R710s with 2x Intel Xeon x5680 six-core processors and 96GB of RAM were connected via Fibre Channel to a NetApp FAS6030 array. The servers were dual booted between Red Hat Enterprise Linux 5.5 and vSphere ESXi 4.1. Each server was connected via three gigabit Ethernet NICs to a shared switch. One NIC was used for the public network and the other two were used for interconnect and cluster traffic.
The NetApp storage array had a total of 112 10K RPM 274GB Fibre Channel disks. Two 200GB LUNs, backed by a total of 80 disks, were used to create a data volume in Oracle ASM. Each data LUN was backed by a 40 disk RAID DP aggregate on the storage array. A 100GB log LUN was created on another volume that was backed by a 26 disk RAID DP aggregate. An additional small 2GB LUN was created to be used as the voting disk for the RAC cluster.
Each VM was configured with 32GB of RAM, three VMXNET3 virtual NICs, and a PVSCSI adapter for all the LUNs used except the OS disk. In order for the VMs to be able to share disks with physical hosts, it was necessary to mount the disks as RDMs and put the virtual SCSI adapter into physical compatibility mode. Additionally, to achieve the best performance for the Oracle RAC interconnect, the VMXNET3 NICs were configured with ethernetX.intrmode =1 in the vmx file. This option is a work around for an ESX performance bug that is specific to RHEL 5.5 VMs and to extremely latency sensitive workloads. The additional configuration option is no longer needed starting with ESX 4.1u1 because the bug is fixed starting with that version.
A four node Oracle RAC cluster was created with two virtual nodes and two physical nodes. The virtual nodes were hosted on a third server when the two servers used for testing were booted to the native RHEL environment. RHEL 5.5 x64 and Oracle 11gR2 were installed on all nodes. During tests the two servers used for testing were booted either to native RHEL or ESX for the physical or virtual tests respectively. This meant that only the two virtual nodes or the two native nodes were powered on during a physical or virtual test. The diagrams below show the same test environment when setup for the two node physical or virtual test.
Physical Test Diagram:
Virtual Test Diagram:
The servers used in testing have a total of 12 physical cores and 24 logical threads if hyperthreading is enabled. The maximum number of vCPUs per VM supported by ESXi 4.1 is eight. This made it necessary to limit the physical server to a smaller number of cores to enable a performance comparison. Using the BIOS settings of the server, hyperthreading was disabled and the number of cores limited to two and four per socket. This resulted in four and eight core physical server configurations that were compared with VM configurations of four and eight vCPUs. Limiting the physical server configurations was only done to enable a direct performance comparison and is clearly not a good way to configure a system for performance normally.
Open source DVD Store 2.1 was used as the workload for the test. DVD Store is an OLTP database workload that simulates customers logging on, browsing, and purchasing DVDs from an online store. It includes database build scripts, load files, and driver programs. For these tests, the database driver was used to directly load the database without a need to have the Web tier installed. Using the new DVD Store 2.1 functionality, two custom-size databases of 50GB each with a 12GB SGA were created as two different instances named DS2 and DS2B. Both instances were running on both nodes of the cluster and were accessed equally on each node.
Running an equal amount of load against each instance on each node was done with both the four CPU and eight CPU test cases. DS2 and DS2B instances spanned all nodes and were actively used on all nodes. An equal amount of threads were connected for each instance on each node. The amount of work was scaled up with the number of processors: twice as many DVD Store driver threads were used in the eight CPU case as compared with the four CPU case. For example, a total of 40 threads were running against node one in the four CPU test with 20 accessing DS2 and 20 accessing DS2B. Another 40 threads were accessing DS2 and DS2B on node two at the same time during that test. CPU utilization of the physical hosts and VMs were above 95% in all tests. Results are reported in terms of Orders Per Minute (OPM) and Average Response Time (RT) in milliseconds.
In both the OPM and RT measurements, the virtual RAC performance was within 11 to 13 percent of the physical RAC performance. In an intensive test running on Oracle RAC, the CPU, disk, and network were heavily utilized, but virtual performance was close to native performance. This result removes a barrier from considering virtualizing one of the more performance-intensive tier-one applications in the datacenter.