Home > Blogs > VMware VROOM! Blog > Tag Archives: vmware

Tag Archives: vmware

Virtual SAP HANA Achieves Production Level Performance

VMware CEO Pat Gelsinger announced production support for SAP HANA on VMware vSphere 5.5 at EMC World this week during his keynote. This is the end result of a very thorough joint testing project over the past year between VMware and SAP.

HANA is an in-memory platform (including database capabilities) from SAP that has enabled huge gains in performance for customers and has been a high priority for SAP over the past few years.  In order for HANA to be supported in a virtual machine on vSphere 5.5 for production workloads, we worked closely with SAP to enable, design, and measure in-depth performance tests.

In order to enable the testing and ongoing production support of SAP HANA on vSphere, two HANA appliance servers were ordered, shipped, and installed into SAP’s labs in Waldorf Germany.  These systems are dedicated to running SAP HANA on vSphere onsite at SAP.  Each system is an Intel Xeon E7-8870 (Westmere-EX) based four-socket server with 1TB of RAM.  They are used for performance testing and also for ongoing support of HANA on vSphere.  Additionally, VMware has onsite support engineering to assist with the testing and support.

SAP designed an extensive performance test suite that used a large number of test scenarios to stress all functions and capabilities of HANA running on vSphere 5.5.  They included OLAP and OLTP with a wide range of data sizes and query functions. In all, over one thousand individual test cases were used in this comprehensive test suite.  These same tests were run on identical native HANA systems and the difference between native and virtual tests was used as the key performance indicator.

In addition, we also tested vSphere features including vMotion, DRS, and VMware HA with virtual machines running HANA.  These tests were done with the HANA virtual machine under heavy stress.

The test results have been extremely positive and are one of the key factors in the announcement of production support.  The difference between virtual and native HANA across all the performance tests was on average within a few percentage points.

The vMotion, DRS, and VMware HA tests were all completed without issues.  Even with the large memory sizes of HANA virtual machines, we were still able to successfully migrate them with vMotion while under load with no issues.

One of the results of the extensive testing is a best practices guide for HANA on vSphere 5.5. This document includes a performance guide for running HANA on vSphere 5.5 based on this extensive testing.  The document also includes information about how to size a virtual HANA instance and how VMware HA can be used in conjunction with HANA’s own replication technology for high availability.

VDI Benchmarking Using View Planner on VMware Virtual SAN – Part 3

In part 1 and part 2 of the VDI/VSAN benchmarking blog series, we presented the VDI benchmark results on VSAN for 3-node, 5-node, 7-node, and 8-node cluster configurations. In this blog, we compare the VDI benchmarking performance of VSAN with an all flash storage array. The intent of this experiment is not to compare the maximum IOPS that you can achieve on these storage solutions; instead, we show how VSAN scales as we add more heavy VDI users. We found that VSAN can support a similar number of users as that of an all flash array even though VSAN is using host resources.

The characteristic of VDI workload is that they are CPU bound, but sensitive to I/O which makes View Planner a natural fit for this comparative study. We use VMware View Planner 3.0 for both VSAN and all flash SAN and consolidate as many heavy users as much we can on a particular cluster configuration while meeting the quality of service (QoS) criteria. Then, we find the difference in the number of users we can support before we run out of CPU, because I/O is not a bottleneck here. Since VSAN runs in the kernel and uses CPU on the host for its operation, we find that the CPU usage is quite minimal, and we see no more than a 5% consolidation difference for a heavy user run on VSAN compared to the all flash array.

As discussed in the previous blog, we used the same experimental setup where each VSAN host has two disk groups and each disk group has one PCI-e solid-state drive (SSD) of 200GB and six 300GB 15k RPM SAS disks. We built a 7-node and a 8-node cluster and ran View Planner to get the VDImark™ score for both VSAN and the all flash array. VDImark signifies the number of heavy users you can successfully run and meet the QoS criteria for a system under test. The VDImark for both VSAN and all flash array is shown in the following figure.

View Planner QoS (VDImark)

 

 From the above chart, we see that VSAN can consolidate 677 heavy users (VDImark) for 7-node and 767 heavy users for 8-node cluster. When compared to the all flash array, we don’t see more than 5% difference in the user consolidation. To further illustrate the Group-A and Group-B response times, we show the average response time of individual operations for these runs for both Group-A and Group-B, as follows.

Group-A Response Times

As seen in the figure above for both VSAN and the all flash array, the average response times of the most interactive operations are less than one second, which is needed to provide a good end-user experience.  Similar to the user consolidation, the response time of Group-A operations in VSAN is similar to what we saw with the all flash array.

Group-B Response Times

Group-B operations are sensitive to both CPU and IO and 95% should be less than six seconds to meet the QoS criteria. From the above figure, we see that the average response time for most of the operations is within the threshold and we see similar response time in VSAN when compared to the all flash array.

To see other parts on the VDI/VSAN benchmarking blog series, check the links below:
VDI Benchmarking Using View Planner on VMware Virtual SAN – Part 1
VDI Benchmarking Using View Planner on VMware Virtual SAN – Part 2
VDI Benchmarking Using View Planner on VMware Virtual SAN – Part 3

 

VDI Benchmarking Using View Planner on VMware Virtual SAN – Part 2

In part 1, we presented the VDI benchmark results on VSAN for 3-node and 7-node configurations. In this blog, we update the results for 5-node and 8-node VSAN configurations and show how VSAN scales for these configurations.

The View Planner benchmark was run again to find the VDImark for different numbers of nodes (5 and 8 nodes) in a VSAN cluster as described in the previous blog and the results are shown in the following figure.

View Planner QoS (VDImark)

 

In the 5-node cluster, a VDImark score of 473 was achieved and for the 8-node cluster, a VDImark score of 767 was achieved. These results are similar to the ones we saw on the 3-node and 7-node cluster earlier (about 95 VMs per host). So, there is nice scaling in terms of maximum VMs supported as the numbers of nodes were increased in the VSAN from 3 to 8.

To further illustrate the Group-A and Group-B response times, we show the average response time of individual operations for these runs for both Group-A and Group-B, as follows.

Group-A Response Times

As seen in the figure above, the average response times of the most interactive operations are less than one second, which is needed to provide a good end-user experience. If we look at the new results for 5-node and 8-node VSAN, we see that for most of the operations, the response time mostly remains the same across different node configurations.

Group-B Response Times

Since Group-B is more sensitive to I/O and CPU usage, the above chart for Group-B operations is more important to see how View Planner scales. The chart shows that there is not much difference in the response times as the number of VMs were increased from 286 VMs on a 3-node cluster to 767 VMs on an 8-node cluster. Hence, storage-sensitive VDI operations also scale well as we scale the VSAN nodes from 3 to 8 and user experience expectations are met.

To see other parts on the VDI/VSAN benchmarking blog series, check the links below:
VDI Benchmarking Using View Planner on VMware Virtual SAN – Part 1
VDI Benchmarking Using View Planner on VMware Virtual SAN – Part 2
VDI Benchmarking Using View Planner on VMware Virtual SAN – Part 3

 

 

Deploying Extremely Latency-Sensitive Applications in VMware vSphere 5.5

VMware vSphere ensures that virtualization overhead is minimized so that it is not noticeable for a wide range of applications including most business critical applications such as database systems, Web applications, and messaging systems. vSphere also supports well applications with millisecond-level latency constraints, including VoIP services. However, performance demands of latency-sensitive applications with very low latency requirements such as distributed in-memory data management, stock trading, and high-performance computing have long been thought to be incompatible with virtualization.

vSphere 5.5 includes a new feature for setting latency sensitivity in order to support virtual machines with strict latency requirements. This per-VM feature allows virtual machines to exclusively own physical cores, thus avoiding overhead related to CPU scheduling and contention. A recent performance study shows that using this feature combined with pass-through mechanisms such as SR-IOV and DirectPath I/O helps to achieve near-native performance in terms of both response time and jitter.

The paper explains major sources of latency increase due to virtualization in vSphere and presents details of how the latency-sensitivity feature improves performance along with evaluation results of the feature. It also presents some best practices that were concluded from the performance evaluation.

For more information, please read the full paper: Deploying Extremely Latency-Sensitive Applications in VMware vSphere 5.5.

 

VMware Horizon View 5.2 Performance & Best Practices and A Performance Deep Dive on Hardware Accelerated 3D Graphics

VMware Horizon View 5.2 simplifies desktop and application management while increasing security and control and delivers a personalized high fidelity experience for end-users across sessions and devices. It enables higher availability and agility of desktop services unmatched by traditional PCs while reducing the total cost of desktop ownership and end-users can enjoy new levels of productivity and the freedom to access desktops from more devices and locations while giving IT greater policy control.

Recently, we published two whitepapers to provide a performance deep-dive on Horizon View 5.2 performance and hardware accelerated 3D graphics (vSGA) feature. The links to these whitepapers are as follows:

* VMware Horizon View 5.2 Performance and Best Practices
* VMware Horizon View 5.2 and Hardware Accelerated 3D Graphics

The first whitepaper describes View 5.2 new features, including access of View desktops with Horizon, space efficient sparse (SEsparse) disks, hardware accelerated 3D graphics, and full support of Windows 8 desktops. View 5.2 performance improvements in PCoIP and View management are highlighted. In addition, this paper presents View 5.2 PCoIP performance results, Windows 8 and RDP 8 performance analysis, and a vSGA performance analysis, including how vSGA compares to the software renderer support introduced in View 5.1.

The second whitepaper goes in-depth on the support for hardware accelerated 3D graphics that debuted with VMware vSphere 5.1 and VMware Horizon View 5.2 and presents performance and consolidation results for a number of different workloads, ranging from knowledge workers using 3D desktops to performance-intensive CAD-based workloads. Because the intensity of a 3D workload will vary greatly from user to user and application to application, rather than highlighting specific case studies, we demonstrate how the solution efficiently scales for both light- and heavy-weight 3D workloads, until GPU or CPU resources are fully utilized. This paper also presents key best practices to extract peak performance from a 3D View 5.2 deployment.

1millionIOPS On 1VM

Last year at VMworld 2011 we presented one million I/O operations per second (IOPS) on a single vSphere 5 host (link).  The intent was to demonstrate vSphere 5′s performance by using mutilple VMs to drive an aggregate load of one million IOPS through a single server.   There has recently been some interest in driving similar I/O load through a single VM.  We used a pair of Violin Memory 6616 flash memory arrays, which we connected to a two-socket HP DL380 server, for some quick experiments prior to VMworld.  vSphere 5.1 was able to demonstrate high performance and I/O efficiency by exceeding one million IOPS, doing so with only a modest eight-way VM.  A brief description of our configuration and results is given below.

Configuration:
Hypervisor: vSphere 5.1
Server: HP DL380 Gen8
CPU: 2 x Intel Xeon E5-2690, HyperThreading disabled
Memory: 256GB
HBAs: 5 x QLE2562
Storage: 2 x Violin Memory 6616 Flash Memory Arrays
VM: Windows Server 2008 R2, 8 vCPUs and 48GB.
Iometer Config: 4K IO size w/ 16 workers

Results:
Using the above configuration we achieved 1055896 total sustained IOPS.  Check out the following short video clip from one of our latest runs.

Look out for a more thorough write-up after VMworld.

 

Performance Implications of Storage I/O Control-Enabled NFS Datastores

Storage I/O Control (SIOC) allows administrators to control the amount of access virtual machines have to the I/O queues on a shared datastore. With this feature, administrators can ensure that a virtual machine running a business-critical application has a higher priority to access the I/O queue than that of other virtual machines sharing the same datastore. In vSphere 4.1, SIOC was supported on VMFS-based datastores that used SAN with iSCSI and Fibre Channel. In vSphere 5, SIOC support has been extended to NFS-based datastores.

Recent tests conducted at VMware Performance Engineering lab studied the following aspects of SIOC:

  • The performance impact of SIOC: A fine-grained access management of the I/O queues resulted in a 10% improvement in the response time of the workload used for the tests.
  • SIOC’s ability to isolate the performance of applications with a smaller request size: Some applications like Web and media servers use I/O patterns with a large request size (for example, 32K). But some other applications like OLTP databases request smaller I/Os ≤8K. Test findings show that SIOC helped an OLTP database workload to achieve higher performance when sharing the underlying datastore with a workload that used large-sized I/O requests.
  • The intelligent prioritization of I/O resources: SIOC monitors virtual machines’ usage of the I/O queue at the host and dynamically redistributes any unutilized queue slots to those virtual machines that need them. Tests show that this process happens consistently and reliably.

For the full paper, see Performance Implications of Storage I/O Control–Enabled NFS Datastores in VMware vSphere 5

 

VMware View & PCoIP at VMworld

In recent weeks there’s been growing excitement about the PCoIP enhancements coming to VMware View. For instance, Warren Ponder discussed here how these enhancements reduce bandwidth consumption by up to 75%. Engineers from VMware’s performance team (& Warren) will be talking more about these enhancements and how they translate into real-world performance at the rapidly approaching VMworld 2011 in Las Vegas:

EUC1987: VMware View PC-over-IP Performance and Best Practices
Tuesday, August 30th 12:00
Wednesday, August 31st 1:00

EUC3163: VMware View Performance and Best Practices
Tuesday, August 30th – 4:30
Wednesday, August 31st – 4:00

We will also be blogging additional details and performance results as VMworld progresses, followed by a performance whitepaper.

Stay tuned!

Troubleshooting Performance Related Problems in vSphere 4.1 Environments

The hugely popular Performance Troubleshooting for VMware vSphere 4 guide is now updated for vSphere 4.1 . This document provides step-by-step approaches for troubleshooting most common performance problems in vSphere-based virtual environments. The steps discussed in the document use performance data and charts readily available in the vSphere Client and esxtop to aid the troubleshooting flows. Each performance troubleshooting flow has two parts:

  1. How to identify the problem using specific performance counters.
  2. Possible causes of the problem and solutions to solve it.

New sections that were added to the document include troubleshooting performance problems in resource pools on standalone hosts and DRS clusters, additional troubleshooting steps for environments experiencing memory pressure (hosts with compressed and swapped memory), high CPU ready time in hosts that are not CPU saturated, environments sharing resources such as storage and network, and environments using snapshots.

The Troubleshooting guide can be found here. Readers are encouraged to provide their feedback and comments in the performance community site at this link

 

Performance Scaling of an Entry-Level Cluster

Performance benchmarking is often conducted on top-of-the-line hardware, including hosts that typically have a large number of cores, maximum memory, and the fastest disks available. Hardware of this caliber is not always accessible to small or medium-sized businesses with modest IT budgets. As part of our ongoing investigation of different ways to benchmark the cloud using the newly released VMmark 2.0, we set out to determine whether a cluster of less powerful hosts could be a viable alternative for these businesses. We used VMmark 2.0 to see how a four-host cluster with a modest hardware configuration would scale under increasing load.

Workload throughput is often limited by disk performance, so the tests were repeated with two different storage arrays to show the effect that upgrading the storage would offer in terms of performance improvement. We tested two disk arrays that varied in both speed and number of disks, an EMC CX500 and an EMC CX3-20, while holding all other characteristics of the testbed constant.

To review, VMmark 2.0 is a next-generation, multi-host virtualization benchmark that models application performance and the effects of common infrastructure operations such as vMotion, Storage vMotion, and a virtual machine deployment. Each tile contains Microsoft Exchange 2007, DVD Store 2.1, and Olio application workloads which run in a throttled fashion. The Storage vMotion and VM deployment infrastructure operations require the user to specify a LUN as the storage destination. The VMmark 2.0 score is computed as a weighted average of application workload throughput and infrastructure operation throughput. For more details about VMmark 2.0, see the VMmark 2.0 website or Joshua Schnee’s description of the benchmark.

Configuration
All tests were conducted on a cluster of four Dell PowerEdge R310 hosts running VMware ESX 4.1 and managed by VMware vCenter Server 4.1.  These are typical of today’s entry-level servers; each server contained a single quad-core Intel Xeon 2.80 GHz X3460 processor (with hyperthreading enabled) and 32 GB of RAM.  The servers also used two 1Gbit NICs for VM traffic and a third 1Gbit NIC for vMotion activity.

To determine the relative impact of different storage solutions on benchmark performance, runs were conducted on two existing storage arrays, an EMC CX500 and an EMC CX3-20. For details on the array configurations, refer to Table 1 below. VMs were stored on identically configured ‘application’ LUNs, while a designated ‘maintenance’ LUN was used for the Storage vMotion and VM deployment operations.

Table 1. Disk Array Configuration   Table1-3

Results
To measure the cluster's performance scaling under increasing load, we started by running one tile, then increased the number of tiles until the run failed to meet Quality of Service (QoS) requirements. As load is increased on the cluster, it is expected that the application throughput, CPU utilization, and VMmark 2.0 scores will increase; the VMmark score increases as a function of throughput. By scaling out the number of tiles, we hoped to determine the maximum load our four-host cluster of entry-level servers could support.  VMmark 2.0 scores will not scale linearly from one to three tiles because, in this configuration, the infrastructure operations load remained constant. Infrastructure load increases primarily as a function of cluster size. Although showing only a two host cluster, the relationship between application throughput, infrastructure operations throughput and number of tiles is demonstrated more clearly by this figure from Joshua Schnee’s recent blog article. Secondly, we expected to see improved performance when running on the CX3-20 versus the CX500 because the CX3-20 has a larger number of disks per LUN as well as faster individual drives. Figure 1 below details the scale out performance on the CX500 and the CX3-20 disk arrays using VMmark 2.0. 

Figure 1. VMmark 2.0 Scale Out On a Four-Host Cluster

Figure1-2

Both configurations saw improved throughput from one to three tiles but at four tiles they failed to meet at least one QoS requirement. These results show that a user wanting to maintain an average cluster CPU utilization of 50% on their four-host cluster could count on the cluster to support a two-tile load. Note that in this experiment, increased scores across tiles are largely due to increased workload throughput rather than an increased number of infrastructure operations.

As expected, runs using the CX3-20 showed consistently higher normalized scores than those on the CX500. Runs on the CX3-20 outperformed the CX500 by 15%, 14%, and 12% on the one, two, and three-tile runs, respectively. The increased performance of the CX3-20 over the CX500 was accompanied by approximately 10% higher CPU utilization, which indicated that that the faster CX3-20 disks allowed the CPU to stay busier, increasing total throughput.

The results show that our cluster of entry-level servers with a modest disk array supported approximately 220 DVD Store 2.1 operations per second, 16 send-mail actions, and 235 Olio updates per second. A more robust disk array supported 270 DVD Store 2.1 operations per second, 16 send-mail actions, and 235 Olio updates per second with 20% lower latencies on average and a correspondingly slightly higher CPU utilization.

Note that this type of experiment is possible for the first time with VMmark 2.0; VMmark 1.x was limited to benchmarking a single host but the entry-level servers under test in this study would not have been able to support even a single VMmark 2.0 tile on an individual server. By spreading the load of one tile across a cluster of servers, however, it becomes possible to quantify the load that the cluster as a whole is capable of supporting.  Benchmarking our cluster with VMmark 2.0 has shown that even modest clusters running vSphere can deliver an enormous amount of computing power to run complex multi-tier workloads.

Future Directions
In this study, we scaled out VMmark 2.0 on a four-host entry-level cluster to measure performance scaling and the maximum supported number of tiles. This put a much higher load onto the cluster than might be typical for a small or medium business so that businesses can confidently deploy their application workloads.  An alternate experiment would be to run fewer tiles while measuring the performance of other enterprise-level features, such as VMware High Availability. This ability to benchmark the cloud in many different ways is one benefit of having a well-designed multi-host benchmark. Keep watching this blog for more interesting studies in benchmarking the cloud with VMmark 2.0.