A performance study shows that VMware vSphere 5.5 with Virtual SAN as the storage backend provides an excellent platform for virtualized deployments of SAP IQ Multiplex Servers.
We created four virtual machines with the RHEL 6.3 operating system, and these virtual machines made up the SAP IQ Multiplex Server, which used Virtual SAN as its storage backend. In order to measure performance, we looked at the distributed query processing (DQP) modes of SAP IQ. In DQP, work is performed by threads running on both leader and worker nodes, and intermediate results are transmitted between these nodes through a shared disk space, or over an inter-node network. In the paper, we refer to these modes as storage-transfer and network-transfer.
In a test consisting of concurrent streams of queries designed to emulate a multi-user scenario, we found that the read-heavy I/O profile of this workload takes full advantage of the Virtual SAN’s flash acceleration layer. Data read from magnetic disks in each disk group, is cached in the SSD in the disk group. Since 70% of SSD capacity is reserved for the read cache, a significant amount of data is quickly placed in very low latency storage. Once it is warmed up, I/O requests are served from the read cache, leading to fast query response times. Add to this SAP IQ’s ability to use network resources to handle intermediate results transfer and we get an additional bump in throughput since we no longer have the overhead of writing intermediate, shared results to disk.
Better performance, lower latency, and streamlined statistics are just some of the new features you can expect to find in the vCenter Server in version 5.1. The VMware performance team has published a paper about vCenter Server 5.1 database performance in large environments. The paper shows that statistics collection creates the biggest performance impact on the vCenter Server database. In vSphere 5.1, several aspects of statistics collection have been changed to improve the overall performance of the database. There were three sources of I/O to the statistics tables in vCenter Server—inserting statistics, rolling up statistics between different intervals, and deleting statistics when they expire. These activities have been improved by changing the way the relevant data is persisted to the tables, by partitioning the tables instead of using staging tables. In addition, by removing the staging tables, statistics collection is more robust, resolving the issues described in KB 2011523 and KB 1003878. Scalability is also improved by allowing larger inventories to be supported because they don’t take so long to read/write data from the old staging tables. The paper also includes best practices to take advantage of these changes in environments where vCenter Server has a large inventory. For more details, see vCenter Server 5.1 Database Performance in Large-Scale Environments.
Here are the URLs for the paper, “VMware vCenter Server 5.1 Database Performance Improvements and Best Practices for Large-Scale Environments”:
Oracle Real Application Clusters (RAC) is used to run critical databases with stringent performance requirements. A series of tests recently were run in the VMware performance lab to determine how an Oracle RAC database performs when running on vSphere. The test results showed that the application performed within 11 to 13 percent of physical when running in a virtualized environment.
Two servers were used for both physical and virtual tests. Two Dell PowerEdge R710s with 2x Intel Xeon x5680 six-core processors and 96GB of RAM were connected via Fibre Channel to a NetApp FAS6030 array. The servers were dual booted between Red Hat Enterprise Linux 5.5 and vSphere ESXi 4.1. Each server was connected via three gigabit Ethernet NICs to a shared switch. One NIC was used for the public network and the other two were used for interconnect and cluster traffic.
The NetApp storage array had a total of 112 10K RPM 274GB Fibre Channel disks. Two 200GB LUNs, backed by a total of 80 disks, were used to create a data volume in Oracle ASM. Each data LUN was backed by a 40 disk RAID DP aggregate on the storage array. A 100GB log LUN was created on another volume that was backed by a 26 disk RAID DP aggregate. An additional small 2GB LUN was created to be used as the voting disk for the RAC cluster.
Each VM was configured with 32GB of RAM, three VMXNET3 virtual NICs, and a PVSCSI adapter for all the LUNs used except the OS disk. In order for the VMs to be able to share disks with physical hosts, it was necessary to mount the disks as RDMs and put the virtual SCSI adapter into physical compatibility mode. Additionally, to achieve the best performance for the Oracle RAC interconnect, the VMXNET3 NICs were configured with ethernetX.intrmode =1 in the vmx file. This option is a work around for an ESX performance bug that is specific to RHEL 5.5 VMs and to extremely latency sensitive workloads. The additional configuration option is no longer needed starting with ESX 4.1u1 because the bug is fixed starting with that version.
A four node Oracle RAC cluster was created with two virtual nodes and two physical nodes. The virtual nodes were hosted on a third server when the two servers used for testing were booted to the native RHEL environment. RHEL 5.5 x64 and Oracle 11gR2 were installed on all nodes. During tests the two servers used for testing were booted either to native RHEL or ESX for the physical or virtual tests respectively. This meant that only the two virtual nodes or the two native nodes were powered on during a physical or virtual test. The diagrams below show the same test environment when setup for the two node physical or virtual test.
Physical Test Diagram:
Virtual Test Diagram:
The servers used in testing have a total of 12 physical cores and 24 logical threads if hyperthreading is enabled. The maximum number of vCPUs per VM supported by ESXi 4.1 is eight. This made it necessary to limit the physical server to a smaller number of cores to enable a performance comparison. Using the BIOS settings of the server, hyperthreading was disabled and the number of cores limited to two and four per socket. This resulted in four and eight core physical server configurations that were compared with VM configurations of four and eight vCPUs. Limiting the physical server configurations was only done to enable a direct performance comparison and is clearly not a good way to configure a system for performance normally.
Open source DVD Store 2.1 was used as the workload for the test. DVD Store is an OLTP database workload that simulates customers logging on, browsing, and purchasing DVDs from an online store. It includes database build scripts, load files, and driver programs. For these tests, the database driver was used to directly load the database without a need to have the Web tier installed. Using the new DVD Store 2.1 functionality, two custom-size databases of 50GB each with a 12GB SGA were created as two different instances named DS2 and DS2B. Both instances were running on both nodes of the cluster and were accessed equally on each node.
Running an equal amount of load against each instance on each node was done with both the four CPU and eight CPU test cases. DS2 and DS2B instances spanned all nodes and were actively used on all nodes. An equal amount of threads were connected for each instance on each node. The amount of work was scaled up with the number of processors: twice as many DVD Store driver threads were used in the eight CPU case as compared with the four CPU case. For example, a total of 40 threads were running against node one in the four CPU test with 20 accessing DS2 and 20 accessing DS2B. Another 40 threads were accessing DS2 and DS2B on node two at the same time during that test. CPU utilization of the physical hosts and VMs were above 95% in all tests. Results are reported in terms of Orders Per Minute (OPM) and Average Response Time (RT) in milliseconds.
In both the OPM and RT measurements, the virtual RAC performance was within 11 to 13 percent of the physical RAC performance. In an intensive test running on Oracle RAC, the CPU, disk, and network were heavily utilized, but virtual performance was close to native performance. This result removes a barrier from considering virtualizing one of the more performance-intensive tier-one applications in the datacenter.