Home > Blogs > VMware VROOM! Blog > Category Archives: Uncategorized

Category Archives: Uncategorized

First Certified SAP BW-EML Benchmark on Virtual HANA

The first certified SAP Business Warehouse-Enhanced Mixed Workload (BW-EML) standard application benchmark based on a virtual HANA database was recently published by HP.  We worked with HP to configure and run this benchmark using a virtual HANA database running on vSphere 5.5 in a monster VM of 64 vCPUs and almost 1TB of RAM.  The test was run with a total of 2 billion records and achieved a throughput of 111,850 ad-hoc navigation steps per hour.

The same hardware configuration was used by HP to publish a native only benchmark with the same number of records. In that test, the result was 126,980 ad-hoc navigation steps per hour which is only 12% higher throughput than the virtual HANA result.

BW-EML_VirtualHANA_Graph_VROOM

Although the hardware setup was the same, this comparison between native and virtual performance has one wrinkle that gave the native system a slight advantage, estimated to be about 5%.

The reason for the estimated 5% advantage for the native system is due to the difference between cores and threads and the maximum number of vCPUs.  In the case of the native test, the BW-EML workload was able to exercise all 120 hardware threads of the physical 60 core server.  The number of threads is twice the number of physical server cores because these processors utilize Intel Hyper-Threading technology.

In vSphere 5.5 (the current version) the maximum number of vCPUs that can be used in a single VM is 64. Each vCPU is mapped to a hardware thread when scheduled to run. This limits the number of hardware threads that a single VM can use to 64, which means that for this test only slightly more than half of the 120 hardware server threads could be used for the HANA virtual machine. This means that the virtual machine was not able to directly benefit from Hyper-Threading, but was able to use all 60 cores.

The benefit of Hyper-Threading can be as much as 20% to 30% for some applications, but in the case of the BW-EML benchmark, it is estimated to be about 5%.  This estimate was found by running the native BW-EML benchmark system with and without Hyper-Threading enabled.  Because the virtual machine was not able to use the Hyper-Threads, it is estimated that the native system had a 5% advantage due to its ability to use all 120 threads of the physical server.

In theory, the advantage for the native system could be reduced by either creating a bigger virtual machine or running the native system without Hyper-Threading.  If this were done, then the difference between native and virtual should be about 5% smaller and would mean that the difference between native and virtual could shrink to single digits (approximately 7%).

Additional details about the certified SAP BW-EML benchmark configurations used in the tests: SAP HANA 1.0 on HP DL580 Gen8, 4 processors with 60 cores / 120 threads using Intel Xeon E7-4880 v2 running at 2.5 GHz and 1TB of main memory (each processor has 15 cores / 30 threads).  The application servers were SAP NetWeaver 7.30 on HP BL680 G7, 4 processors with 40 cores / 80 threads using Intel Xeon E7-4870 running at 2.4 GHz and 1TB of main memory (each processor has 10 cores / 20 threads). The OS used for all servers was SuSE Enterprise Linux Server 11 SP2.  The certification number for the native test is 2014009 and the certification number for the virtual test is 2014021.

Virtual SAP HANA Achieves Production Level Performance

VMware CEO Pat Gelsinger announced production support for SAP HANA on VMware vSphere 5.5 at EMC World this week during his keynote. This is the end result of a very thorough joint testing project over the past year between VMware and SAP.

HANA is an in-memory platform (including database capabilities) from SAP that has enabled huge gains in performance for customers and has been a high priority for SAP over the past few years.  In order for HANA to be supported in a virtual machine on vSphere 5.5 for production workloads, we worked closely with SAP to enable, design, and measure in-depth performance tests.

In order to enable the testing and ongoing production support of SAP HANA on vSphere, two HANA appliance servers were ordered, shipped, and installed into SAP’s labs in Waldorf Germany.  These systems are dedicated to running SAP HANA on vSphere onsite at SAP.  Each system is an Intel Xeon E7-8870 (Westmere-EX) based four-socket server with 1TB of RAM.  They are used for performance testing and also for ongoing support of HANA on vSphere.  Additionally, VMware has onsite support engineering to assist with the testing and support.

SAP designed an extensive performance test suite that used a large number of test scenarios to stress all functions and capabilities of HANA running on vSphere 5.5.  They included OLAP and OLTP with a wide range of data sizes and query functions. In all, over one thousand individual test cases were used in this comprehensive test suite.  These same tests were run on identical native HANA systems and the difference between native and virtual tests was used as the key performance indicator.

In addition, we also tested vSphere features including vMotion, DRS, and VMware HA with virtual machines running HANA.  These tests were done with the HANA virtual machine under heavy stress.

The test results have been extremely positive and are one of the key factors in the announcement of production support.  The difference between virtual and native HANA across all the performance tests was on average within a few percentage points.

The vMotion, DRS, and VMware HA tests were all completed without issues.  Even with the large memory sizes of HANA virtual machines, we were still able to successfully migrate them with vMotion while under load with no issues.

One of the results of the extensive testing is a best practices guide for HANA on vSphere 5.5. This document includes a performance guide for running HANA on vSphere 5.5 based on this extensive testing.  The document also includes information about how to size a virtual HANA instance and how VMware HA can be used in conjunction with HANA’s own replication technology for high availability.

SEsparse Shows Significant Improvements over VMFSsparse

Limited amounts of physical resources can make large-scale virtual infrastructure deployments challenging. Provisioning dedicated storage space to hundreds of virtual machines can become particularly expensive. To address this VMware vSphere 5.5 provides two sparse storage techniques, namely VMFSparse and SEsparse. Running multiple VMs using sparse delta-disks with a common parent virtual disk brings down the required amount of physical storage making large-scale deployments manageable. SEsparse was introduced in VMware vSphere 5.1 and in vSphere 5.5 became the default virtual disk snapshotting technique for VMDKs greater than 2 TB. Various enhancements were made to SEsparse technology in the vSphere 5.5 release, which makes SEsparse perform mostly on par or better than VMFSsparse formats. In addition dynamic space reclamation confers on SEsparse a significant advantage over VMFSsparse virtual disk formats. This feature makes SEsparse the choice for VMware® Horizon View™ environments where space reclamation is critical due to the large number of tenants sharing the underlying storage.


A recently published paper reports the results from a series of performance studies of SEsparse and VMFsparse using thin virtual disks as baselines. The performance was evaluated using a comprehensive set of Iometer workloads along with workloads from two real world application domains: Big Data Analytics and Virtual Desktop Infrastructure (VDI). Overall, the performance of SEsparse is significantly better than the VMFSsparse format for random write workloads and mostly on par or better for the other analyzed workloads, depending on type.

Read the full performance study, “SEsparse in VMware vSphere 5.5.”

VMware vFabric Postgres 9.2 Performance and Best Practices

VMware vFabric Postgres (vPostgres) 9.2 improves vertical scalability over the previous version by 300% for pgbench SELECT-only (a common read-only OLTP workload) and by 100% for pgbench (a common read/write OLTP workload). vPostgres 9.2 on vSphere 5.1 achieves equal-to-native vertical scalability on a 32-core machine.

Using out-of-the-box settings for both vPostgres and vSphere, virtual machine (VM)-based database consolidation performs on par with alternative approaches (such as consolidated on one vPostgres server instance or consolidated on multiple vPostgres server instances but one operating system instance) in a baseline memory-undercommitted situation for a standard OLTP workload (using dbt2 benchmark, an open-source fair implementation of TPC-C); while performs increasingly more robust as memory overcommitment escalates (200% better than alternatives under a 55% memory-overcommitted situation).

By using an unconventionally larger database shared buffers (75% of memory size rather than the conventional 25%), vPostgres can attain both better performance (12% better) and more consistent performance (70% less temporal variation).

When using an unconventionally larger database shared buffers, the vPostgres database memory ballooning technique can enhance the robustness of VM-based database consolidation: under a 55% memory-overcommitted situation, using its help can advance the performance advantage of VM-based consolidation over alternatives from 60% to 140%.

For more details including experimentation methodology and references, please read the namesake whitepaper.

Performance Best Practices for vSphere 5.5 is Available

We are pleased to announce the availability of Performance Best Practices for vSphere 5.5. This is a book designed to help system administrators obtain the best performance from vSphere 5.5 deployments.

The book addresses many of the new features in vSphere 5.5 from a performance perspective. These include:

  • vSphere Flash Read Cache, a new feature in vSphere 5.5 allowing flash storage resources on the ESXi host to be used for read caching of virtual machine I/O requests.
  • VMware Virtual SAN (VSAN), a new feature (in beta for vSphere 5.5) allowing storage resources attached directly to ESXi hosts to be used for distributed storage and accessed by multiple ESXi hosts.
  • The VMware vFabric Postgres database (vPostgres).

We’ve also updated and expanded on many of the topics in the book. These include:

  • Running storage latency and network latency sensitive applications
  • NUMA and Virtual NUMA (vNUMA)
  • Memory overcommit techniques
  • Large memory pages
  • Receive-side scaling (RSS), both in guests and on 10 Gigabit Ethernet cards
  • VMware vMotion, Storage vMotion, and Cross-host Storage vMotion
  • VMware Distributed Resource Scheduler (DRS) and Distributed Power Management (DPM)
  • VMware Single Sign-On Server

The book can be found here.

VMware Horizon View 5.2 Performance & Best Practices and A Performance Deep Dive on Hardware Accelerated 3D Graphics

VMware Horizon View 5.2 simplifies desktop and application management while increasing security and control and delivers a personalized high fidelity experience for end-users across sessions and devices. It enables higher availability and agility of desktop services unmatched by traditional PCs while reducing the total cost of desktop ownership and end-users can enjoy new levels of productivity and the freedom to access desktops from more devices and locations while giving IT greater policy control.

Recently, we published two whitepapers to provide a performance deep-dive on Horizon View 5.2 performance and hardware accelerated 3D graphics (vSGA) feature. The links to these whitepapers are as follows:

* VMware Horizon View 5.2 Performance and Best Practices
* VMware Horizon View 5.2 and Hardware Accelerated 3D Graphics

The first whitepaper describes View 5.2 new features, including access of View desktops with Horizon, space efficient sparse (SEsparse) disks, hardware accelerated 3D graphics, and full support of Windows 8 desktops. View 5.2 performance improvements in PCoIP and View management are highlighted. In addition, this paper presents View 5.2 PCoIP performance results, Windows 8 and RDP 8 performance analysis, and a vSGA performance analysis, including how vSGA compares to the software renderer support introduced in View 5.1.

The second whitepaper goes in-depth on the support for hardware accelerated 3D graphics that debuted with VMware vSphere 5.1 and VMware Horizon View 5.2 and presents performance and consolidation results for a number of different workloads, ranging from knowledge workers using 3D desktops to performance-intensive CAD-based workloads. Because the intensity of a 3D workload will vary greatly from user to user and application to application, rather than highlighting specific case studies, we demonstrate how the solution efficiently scales for both light- and heavy-weight 3D workloads, until GPU or CPU resources are fully utilized. This paper also presents key best practices to extract peak performance from a 3D View 5.2 deployment.

Performance Enhancements in View 5.2

View 5.2 became generally available today, and we wanted to take this opportunity to present a high-level overview of some of the performance enhancements that debut with View 5.2 and PCoIP. In this release, PCoIP’s image cache has been significantly improved to allow users on memory constrained devices to run with much smaller cache sizes; firstly, support was introduced to efficiently handle situations where image content is shifted vertically, as occurs during scroll operations. Secondly, View 5.2 debuts improved cache compression algorithms that provide significant additional compression of the View client’s image cache. Finally, the cache’s handling of progressive build operations has been made significantly more efficient. All of these enhancements combine to allow users to derive significant bandwidth reductions using considerably smaller cache sizes than was achievable with View 5.1:

The above figure illustrates that, for typical office workflows, running View 5.2 with up to a 5X smaller cache can still deliver significant bandwidth savings; a 90MB View 5.2 cache was found to deliver comparable performance to View 5.1 configured with a 250MB cache, and even a 50MB View 5.2 cache delivered the majority of the bandwidth reduction benefits observed from View 5.1 configured with a 250MB cache. This up to 5X reduction in cache size can be a compelling option for memory constrained thin clients or tablet devices. The maximum image cache size can be configured via GPOs or set on the client device.

Alternatively, users can continue to leverage the default 250MB cache size in View 5.2 and will see reduced bandwidth utilization in comparison with View 5.1:

The above figure illustrates the average bandwidth utilization observed for View 5.2 during a VMware View Planner run in two different WAN environments for out-of-the-box PCoIP configurations. The results are normalized to the View 5.1 baseline, and illustrate that in the 2 Mb/s environment, the average session bandwidth is reduced by around 6%. Moreover, in the “extreme WAN” environment, View 5.2 delivers almost 10% reduction in bandwidth utilization, compared with View 5.1. These reductions can be compelling when consolidating View sessions from a branch office onto a limited capacity link, or when users are connecting over congested WiFi connections. Furthermore, as would be expected, reducing the number of image blocks being encoded, not only reduces the bandwidth utilization, but also has the benefit of improving interactivity (faster transmission of updates and the opportunity for higher frame rates, given the reduced bandwidth utilization) and reducing CPU consumption (less encoding work being done).

Finally, other PCoIP enhancements that debut with View 5.2 include:

1. GPO settings take immediate effect: many of the performance orientated GPO settings now take effect immediately, allowing users or administrators to closely customize the behavior of their PCoIP sessions.

2. Relative mouse support: previously, support was only provided for absolute mode. However, for certain 3D applications relative mouse is required and support is introduced on View 5.2.

We will cover all of these optimizations in greater detail in an upcoming View 5.2 Performance and Best Practices Whitepaper.

Technical deep dive on VMware VIew Planner

In our prior VMworld sessions and performance white papers, we have presented user experience performance results based on VMware View® Planner, a tool that can generate workloads that are representative of many user-initiated operations in VDI environments. While we have discussed briefly about this tool in prior occasions, there have been many requests to get the architectural details and inner working of the tool. To provide more deep dive and technical details on View Planner, we have recently published an article in the recent release of VMware technical journal (VMTJ Winter 2012), which can be found here: VMware View Planner: Measuring True Virtual Desktop at Scale.

View Planner supports typical VDI user operations and also administrator’s management operations that can be configured to allow VDI evaluators to more accurately represent their particular environment. In this paper, we describe the challenges in building such a workload generator and the platform around it, as well as the View Planner architecture and use cases. We also explain how we used View Planner to perform platform characterization and consolidation studies, find potential performance optimizations and several other use cases.

vCloud Director 5.1 Performance and Best Practices

VMware vCloud Director 5.1 gives enterprise organizations the ability to build secure private clouds that dramatically increase datacenter efficiency and business agility. Coupled with VMware vSphere, vCloud Director delivers cloud computing for existing datacenters by pooling virtual infrastructure resources and delivering them to users as catalog-based services.  vCloud Director 5.1 helps helps IT professionals build agile infrastructure-as-a-service (IaaS)  cloud environments that greatly accelerate the time-to-market for applications and responsiveness of IT organizations.

This white paper addresses three areas regarding vCloud Director performance:

  • vCloud Director sizing guidelines and software requirements
  • Performance characterization and best practices for key vCloud Director operations and new features
  • Best practices in improving performance and tuning vCloud Director architecture

For more details and performance tips, please refer to VMware vCloud Director 5.1 Performance and Best Practices.

vSphere 5.1 IOPS Performance Characterization on Flash-based Storage

At VMworld 2012 we demonstrated a single eight-way VM running on vSphere 5.1 exceeding one million IOPS.  This testing illustrated the high end IOPS performance of vSphere 5.1.

In a new series of tests we have completed some additional characterization of high I/O performance using a very similar environment. The only difference between the 1 million IOPS test environment and the one used for these tests is that the number of Violin Memory Arrays was reduced from two to one (one of the arrays was a short term loan).

Configuration:
Hypervisor: vSphere 5.1
Server: HP DL380 Gen8
CPU: Two Intel Xeon E5-2690, HyperThreading disabled
Memory: 256GB
HBAs: Five QLogic QLE2562
Storage: One Violin Memory 6616 Flash Memory Array
VM: Windows Server 2008 R2, 8 vCPUs and 48GB.
Iometer Configuration: Random, 4KB I/O size with 16 workers

We continued to characterize the performance of vSphere 5.1 and the Violin array across a wider range of configurations and workload conditions.

Based on the types of questions that we often get from customers, we focused on RDM versus VMFS5 comparisons and the usage of various I/O sizes.  In the first series of experiments we compared RDM versus VMFS5 backed datastores using 100% read workload mix while ramping up the I/O size.

click to enlarge

As you can see from the above graph, VMFS5 yielded roughly equivalent performance to that of RDM backed datastores.  Comparing the average of the deltas across all data points showed performance within 1% of RDM for both IOPS and MB/s.  As expected, the number of IOPS decreased after we exceed the default array block size of 4KB, but the throughput continued to scale, approaching 4500 MB/s at both 8KB and 16KB sizes.

For our second series of experiments, we continued to compare RDM versus VMFS5 backed datastores through a progression of block sizes, but this time we altered the workload mix to include 60% reads and 40% writes.

click to enlarge

Violin Memory arrays use a 4KB sector size and perform at their optimal level when managing 4KB blocks. This is very visible in the above IOPS results at the 4KB block size. In the above graph, comparing RDM and VMFS5 IOPS, you can see that VMFS5 performs very well with a 60% read, 40% write mix.  Throughputs continued to scale in a similar fashion as the read-only experimentation and VMFS5 performance for both IOPS and MB/s were within .01% of RDM performance when comparing the average of the deltas across all data points.

The amount of I/O, with just one eight-way VM running on one Violin storage array, is both considerable and sustainable at many I/O sizes.  It’s also noteworthy to point out that running a 60% read and 40% write I/O mix still generated substantial IOPs and bandwidth. While in most cases a single VM won’t need to drive nearly this much I/O traffic, these experiments show that vSphere 5.1 is more than capable of handling it.