Enhanced vMotion Compatibility (EVC) is a technique that allows vMotion to proceed even when ESXi hosts with CPUs of different technologies exist in the vMotion destination cluster. EVC assigns a baseline to all ESXi hosts in the destination cluster so that all of them will be compatible for vMotion. An example is assigning a Nehalem baseline to a cluster mixed with ESXi hosts with Westmere, Nehalem processors. In this case, the features available in Westmere would be hidden, because it is a newer processor than Nehalem. But all ESXi hosts would “broadcast” that they have Nehalem features.
Tests showed how utilizing EVC with different applications affected their performance. Several workloads were chosen to represent typical applications running in enterprise datacenters. The applications represented included database, Java, encryption, and multimedia. To see the results and learn some best practices for performance with EVC, read Impact of Enhanced vMotion Compatibility on Application Performance.
We’re pleased to announce the availability of Performance Best Practices for vSphere 5.1. This is a book designed to help system administrators obtain the best performance from vSphere 5.1 deployments.
The book addresses many of the new features in vSphere 5.1 from a performance perspective. These include:
Use of a system swap file to reduce VMkernel and related memory usage
Flex SE linked clones that can relinquish storage space when it’s no longer needed
Use of jumbo frames for hardware iSCSI
Single Root I/O virtualization (SR-IOV), allowing direct guest access to hardware devices
Enhancements to SplitRx mode, a feature allowing network packets received in a single network queue to be processed on multiple physical CPUs
Enhancements to the vSphere Web Client
VMware Cross-Host Storage vMotion, which allows virtual machines to be moved simultaneously across both hosts and datastores
We’ve also updated and expanded on many of the topics in the book.
These topic include:
Choosing hardware for a vSphere deployment
Configuring ESXi for best performance
Guest operating system performance
vCenter and vCenter database performance
vMotion and Storage vMotion performance
Distributed Resource Scheduler (DRS), Distributed Power Management (DPM), and Storage DRS performance
High Availability (HA), Fault Tolerance (FT), and VMware vCenter Update Manager performance
VMware vSphere Storage Appliance (VSA) and vCenter Single Sign on Server performance
At the vSphere 5.1 release time frame, the 16Gb Fibre Channel fabric and 16Gb FC cards have become generally available. The release of the 16Gb FC driver on the VSphere platform can now take full advantage of the new 16Gb FC HBA and thus have better storage I/O performance.
As described in the paper “Storage I/O Performance on vSphere 5.1 over 16Gb Fibre Channel”, the storage I/O throughput has doubled for the larger block I/Os compared to the 8Gb FC. In the paper it uses single storage I/O worker to show the throughput has improved with better CPU efficiency per I/O. For random I/Os in small block sizes, 16Gb FC can attain much higher I/Os per second than a 8Gb FC connection.
Better performance, lower latency, and streamlined statistics are just some of the new features you can expect to find in the vCenter Server in version 5.1. The VMware performance team has published a paper about vCenter Server 5.1 database performance in large environments. The paper shows that statistics collection creates the biggest performance impact on the vCenter Server database. In vSphere 5.1, several aspects of statistics collection have been changed to improve the overall performance of the database. There were three sources of I/O to the statistics tables in vCenter Server—inserting statistics, rolling up statistics between different intervals, and deleting statistics when they expire. These activities have been improved by changing the way the relevant data is persisted to the tables, by partitioning the tables instead of using staging tables. In addition, by removing the staging tables, statistics collection is more robust, resolving the issues described in KB 2011523 and KB 1003878. Scalability is also improved by allowing larger inventories to be supported because they don’t take so long to read/write data from the old staging tables. The paper also includes best practices to take advantage of these changes in environments where vCenter Server has a large inventory. For more details, see vCenter Server 5.1 Database Performance in Large-Scale Environments.
Here are the URLs for the paper, “VMware vCenter Server 5.1 Database Performance Improvements and Best Practices for Large-Scale Environments”:
The VMware vSphere I/O performance team has published a paper that shows VXLAN performance on vSphere 5.1. Virtual extensible LAN (VXLAN) is a network encapsulation mechanism that enables virtual machines to be deployed on any physical host, regardless of the host’s network configuration.
The paper shows how VXLAN is competitive in terms of performance when compared to a configuration without VXLAN enabled. The paper describes the test results for three experiments: throughput for large and small message sizes, CPU utilization for large and small message sizes, and throughput and CPU utilization for 16 virtual machines with various message sizes. Results show that a virtual machine configured with VXLAN achieved similar networking performance to a virtual machine without VXLAN configured, both in terms of throughput and CPU cost. Additionally, vSphere 5.1 scales well as more virtual machines are added to the VXLAN network. Read the full paper here.
Hadoop provides a platform for building distributed systems for massive data storage and analysis. It has internal mechanisms to replicate user data and to tolerate many kinds of hardware and software failures. However, like many other distributed systems, Hadoop has a small number of Single Points of Failure (SPoFs). These include the NameNode (which manages the Hadoop Distributed Filesystem namespace and keeps track of storage blocks), and the JobTracker (which schedules jobs and the map and reduce tasks that make up each job). VMware vSphere Fault Tolerance (FT) can be used to protect virtual machines that run these vulnerable components of a Hadoop cluster. Recently a cluster of 24 hosts was used to run three different Hadoop applications to show that such protection has only a small impact on application performance. Various Hadoop configurations were employed to artificially create greater load on the NameNode and JobTracker daemons. With conservative extrapolation, these tests show that uniprocessor virtual machines with FT enabled are sufficient to run the master daemons for clusters of more than 200 hosts.
A new white paper, “Protecting Hadoop with VMware vSphere 5 Fault Tolerance,” is now available in which these tests are described in detail. CPU and network utilization of the protected VMs are given to enable comparisons with other distributed applications. In addition, several best practices are suggested to maximize the size of the Hadoop cluster that can be protected with FT.
vMotion and Storage vMotion are key, widely adopted technologies which enable the live migration of virtual machines on the vSphere platform. vMotion provides the ability to live migrate a virtual machine from one vSphere host to another host, with no perceivable impact to the end user. Storage vMotion technology provides the ability to live migrate the virtual disks belonging to a virtual machine across storage elements on the same host. Together, vMotion and Storage vMotion technologies enable critical datacenter workflows, including automated load-balancing with DRS and Storage DRS, hardware maintenance, and the permanent migration of workloads.
Each vSphere release introduces new vMotion functionality, increased reliability and significant performance improvements. vSphere 5.1 continues this trend by offering new enhancements to vMotion that provide a new level of ease and flexibility for live virtual machine migrations. vSphere 5.1 vMotion now removes the shared storage requirement for live migration and allows combining traditional vMotion and Storage vMotion into one operation. The combined migration copies both the virtual machine memory and its disk over the network to the destination vSphere host. This shared-nothing unified live migration feature offers administrators significantly more simplicity and flexibility in managing and moving virtual machines across their virtual infrastructures compared to the traditional vMotion and Storage vMotion migration solutions.
A new white paper, “VMware vSphere 5.1 vMotion Architecture, Performance and Best Practices”, is now available. In that paper, we describe the vSphere 5.1 vMotion architecture and its features. Following the overview and feature description of vMotion in vSphere 5.1, we provide a comprehensive look at the performance of live migrating virtual machines running typical Tier 1 applications using vSphere 5.1 vMotion, Storage vMotion, and vMotion. Tests measure characteristics such as total migration time and application performance during live migration. In addition, we examine vSphere 5.1 vMotion performance over a high-latency network, such as that in a metro area network. Test results show the following:
During storage migration, vSphere 5.1 vMotion maintains the same performance as Storage vMotion, even when using the network to migrate, due to the optimizations added to the vSphere 5.1 vMotion network data path.
During memory migration, vSphere 5.1 vMotion maintains nearly identical performance as the traditional vMotion, due to the optimizations added to the vSphere 5.1 vMotion memory copy path.
vSphere 5.1 vMotion retains the proven reliability, performance, and atomicity of the traditional vMotion and Storage vMotion technologies, even at metro area network distances.
Finally, we describe several best practices to follow when using vMotion.
Last year at VMworld 2011 we presented one million I/O operations per second (IOPS) on a single vSphere 5 host (link). The intent was to demonstrate vSphere 5′s performance by using mutilple VMs to drive an aggregate load of one million IOPS through a single server. There has recently been some interest in driving similar I/O load through a single VM. We used a pair of Violin Memory 6616 flash memory arrays, which we connected to a two-socket HP DL380 server, for some quick experiments prior to VMworld. vSphere 5.1 was able to demonstrate high performance and I/O efficiency by exceeding one million IOPS, doing so with only a modest eight-way VM. A brief description of our configuration and results is given below.
Hypervisor: vSphere 5.1
Server: HP DL380 Gen8
CPU: 2 x Intel Xeon E5-2690, HyperThreading disabled
HBAs: 5 x QLE2562
Storage: 2 x Violin Memory 6616 Flash Memory Arrays
VM: Windows Server 2008 R2, 8 vCPUs and 48GB.
Iometer Config: 4K IO size w/ 16 workers
Using the above configuration we achieved 1055896 total sustained IOPS. Check out the following short video clip from one of our latest runs.
Look out for a more thorough write-up after VMworld.
Looking around the various VDI forums, there's usually some discussion of VDI clients that don't require any additional plug-ins, add-ons or installation of any kind on the end user device – a zero-install solution. While this discussion is usually focused on access via web browsers, I've recently been playing with ThinApp to deliver a zero-install VMware View client.
At the risk of this sounding overly like a sales pitch, I was pretty impressed by the elegance of this solution; I was able to easily pull my thinApped View client from the web, network share and USB drive and launch a Remote Desktop using PCoIP on pretty much any Windows device with a simple double click. Maybe not as quick as double clicking on Firefox, but pretty close.
Are there any downsides? I found application launch time to be a little longer than achieved with a traditionally installed client. Also, USB redirection and virtual printing aren't supported, but then these problems plague web clients too. The main benefit I see is that I get away from the install/update hassles, but keep the performance, bandwidth and user experience benefits associated with PCoIP.
As a final note, I would add that I found the process associated with creating a ThinApped version of the View Client to be extremely straightforward and it took me less than 15 minutes.
We recently worked with EMC's internal IT organization to study the performance of their largest Oracle RAC instance on vSphere 5. The results revealed that Oracle RAC, when virtualized with vSphere 5, performed within 7% on average of a corresponding physical environment. The chart below highlights the results from these tests.
The blue bars are for tests done at a regular level of load and the green bars are for tests done at a 2x level of load. Results are reported in terms of sum total response time in seconds for the variety of different transactions in the tested workload.
This paper also provides the performance best practices developed and implemented for Oracle RAC on vSphere 5 for these tests. This information can be used to optimize existing and new installations of Oracle RAC on vSphere.