Two servers were used for both physical and virtual tests. Two Dell PowerEdge R710s with 2x Intel Xeon x5680 six-core processors and 96GB of RAM were connected via Fibre Channel to a NetApp FAS6030 array. The servers were dual booted between Red Hat Enterprise Linux 5.5 and vSphere ESXi 4.1. Each server was connected via three gigabit Ethernet NICs to a shared switch. One NIC was used for the public network and the other two were used for interconnect and cluster traffic.
The NetApp storage array had a total of 112 10K RPM 274GB Fibre Channel disks. Two 200GB LUNs, backed by a total of 80 disks, were used to create a data volume in Oracle ASM. Each data LUN was backed by a 40 disk RAID DP aggregate on the storage array. A 100GB log LUN was created on another volume that was backed by a 26 disk RAID DP aggregate. An additional small 2GB LUN was created to be used as the voting disk for the RAC cluster.
Each VM was configured with 32GB of RAM, three VMXNET3 virtual NICs, and a PVSCSI adapter for all the LUNs used except the OS disk. In order for the VMs to be able to share disks with physical hosts, it was necessary to mount the disks as RDMs and put the virtual SCSI adapter into physical compatibility mode. Additionally, to achieve the best performance for the Oracle RAC interconnect, the VMXNET3 NICs were configured with ethernetX.intrmode =1 in the vmx file. This option is a work around for an ESX performance bug that is specific to RHEL 5.5 VMs and to extremely latency sensitive workloads. The additional configuration option is no longer needed starting with ESX 4.1u1 because the bug is fixed starting with that version.
A four node Oracle RAC cluster was created with two virtual nodes and two physical nodes. The virtual nodes were hosted on a third server when the two servers used for testing were booted to the native RHEL environment. RHEL 5.5 x64 and Oracle 11gR2 were installed on all nodes. During tests the two servers used for testing were booted either to native RHEL or ESX for the physical or virtual tests respectively. This meant that only the two virtual nodes or the two native nodes were powered on during a physical or virtual test. The diagrams below show the same test environment when setup for the two node physical or virtual test.
Physical Test Diagram:
Virtual Test Diagram:
The servers used in testing have a total of 12 physical cores and 24 logical threads if hyperthreading is enabled. The maximum number of vCPUs per VM supported by ESXi 4.1 is eight. This made it necessary to limit the physical server to a smaller number of cores to enable a performance comparison. Using the BIOS settings of the server, hyperthreading was turned off and the number of cores limited to two and four per socket. This resulted in four and eight core physical server configurations that were compared with VM configurations of four and eight vCPUs. Limiting the physical server configurations was only done to enable a direct performance comparison and is clearly not a good way to configure a system for performance normally.
Open source DVD Store 2.1 was used as the workload for the test. DVD Store is an OLTP database workload that simulates customers logging on, browsing, and purchasing DVDs from an online store. It includes database build scripts, load files, and driver programs. For these tests, the database driver was used to directly load the database without a need to have the Web tier installed. Using the new DVD Store 2.1 functionality, two custom-size databases of 50GB each with a 12GB SGA were created as two different instances named DS2 and DS2B. Both instances were running on both nodes of the cluster and were accessed equally on each node.
Running an equal amount of load against each instance on each node was done with both the four CPU and eight CPU test cases. DS2 and DS2B instances spanned all nodes and were actively used on all nodes. An equal amount of threads were connected for each instance on each node. The amount of work was scaled up with the number of processors: twice as many DVD Store driver threads were used in the eight CPU case as compared with the four CPU case. For example, a total of 40 threads were running against node one in the four CPU test with 20 accessing DS2 and 20 accessing DS2B. Another 40 threads were accessing DS2 and DS2B on node two at the same time during that test. CPU utilization of the physical hosts and VMs were above 95% in all tests. Results are reported in terms of Orders Per Minute (OPM) and Average Response Time (RT) in milliseconds.
In both the OPM and RT measurements, the virtual RAC performance was within 11 to 13 percent of the physical RAC performance. In an intensive test running on Oracle RAC, the CPU, disk, and network were heavily utilized, but virtual performance was close to native performance. This result removes a barrier from considering virtualizing one of the more performance-intensive tier-one applications in the datacenter.