Home > Blogs > VMware VROOM! Blog > Monthly Archives: November 2016

Monthly Archives: November 2016

vCenter Server 6.5 High Availability Performance and Best Practices

High availability (aka HA) services are important in any platform, and VMware vCenter Server® is no exception. As the main administrative and management tool of vSphere, it is a critical element that requires HA. vCenter Server HA (aka VCHA) delivers protection against software and hardware failures with excellent performance for common customer scenarios, as shown in this paper.

Much work has gone into the high availability feature of VMware vCenter Server® 6.5 to ensure that this service and its operations minimally affect the performance of your vCenter Server and vSphere hosts. We thoroughly tested VCHA with a benchmark that simulates common vCenter Server activities in both regular and worst case scenarios. The result is solid data and a comprehensive performance characterization in terms of:

  • Performance of VCHA failover/recovery time objective (RTO): In case of a failure, vCenter Server HA (VCHA) provides failover/RTO such that users can continue with their work in less than 2 minutes through API clients and less than 4 minutes through UI clients. While failover/RTO depends on the vCenter Server configuration and the inventory size, in our tests it is within the target limit, which is 5 minutes.
  • Performance of enabling VCHA: We observed that enabling VCHA would take around 4 – 9 minutes depending on the vCenter Server configuration and the inventory size.
  • VCHA overhead: When VCHA is enabled, there is no significant impact for vCenter Server under typical load conditions. We observed a noticeable but small impact of VCHA when the vCenter Server was under extreme load; however, it is unlikely for customers to generate that much load on the vCenter Server for extended time periods.
  • Performance impact of vCenter Server statistics level: With an increasing statistics level, vCenter Server produces less throughput, as expected. When VCHA is enabled for various statistics levels, we observe a noticeable but small impact of 3% to 9% on throughput.
  • Performance impact of a private network: VCHA is designed to support LAN networks with up to 10 ms latency between VCHA nodes. However, this comes with a performance penalty. We study the performance impact of the private network in detail and provide further guidelines about how to configure VCHA for the best performance.
  • External Platform Services Controller (PSC) vs Embedded PSC: We study VCHA performance comparing these two deployment modes and observe a minimal difference between them.

Throughout the paper, our findings show that vCenter Server HA performs well under a variety of circumstances. In addition to the performance study results, the paper describes the VCHA architecture and includes some useful performance best practices for getting the most from VCHA.

For the full paper, see VMware vCenter Server High Availability Performance and Best Practices.

vSphere 6.5 Update Manager Performance and Best Practices

vSphere Update Manager (VUM) is the patch management tool for VMware vSphere 6.5. IT administrators can use VUM to patch and upgrade ESXi hosts, VMware Tools, virtual hardware, and virtual appliances.

In the vSphere 6.5 release, VUM has been integrated into the vCenter Server appliance (VCSA) for the Linux platform. The integration eliminates remote data transfers between VUM and VCSA, and greatly simplifies the VUM deployment process. As a result, certain data-driven tasks achieve a considerable performance improvement over VUM for the Windows platform, as illustrated in the following figure:

vum-blog-fig1

To present the new performance characteristics for VUM in vSphere 6.5, a paper has been published. In particular, the paper describes the following topics:

  • VUM server deployment
  • VUM operations including scan host, scan VM, stage host, remediate host, and remediate VM
  • Remediation concurrency
  • Resource consumption
  • Running VUM operations with vCenter Server provisioning operations

The paper also offers a number of performance tips and best practices for using VUM during patch maintenance. For the full details, read vSphere Update Manager Performance and Best Practices.

Whitepaper on vSphere Virtual Machine Encryption Performance

vSphere 6.5 introduces a feature called vSphere VM encryption.  When this feature is enabled for a VM, vSphere protects the VM data by encrypting all its contents.  Encryption is done both for already existing data and for newly written data. Whenever the VM data is read, it is decrypted within ESXi before being served to the VM.  Because of this, vSphere VM encryption can have a performance impact on application I/O and the ESXi host CPU usage.

We have published a whitepaper, VMware vSphere Virtual Machine Encryption Performance, to quantify this performance impact.  We focus on synthetic I/O performance on VMs, as well as VM provisioning operations like clone, snapshot creation, and power on.  From analysis of our experiment results, we see that while VM encryption consumes more CPU resources for encryption and decryption, its impact on I/O performance is minimal when using enterprise-class SSD or VMware vSAN storage.  However, when using ultra-high performance storage like locally attached NVMe drives capable of handling up to 750,000 IOPS, the minor increase in per-I/O latency due to encryption or decryption adds up quickly to have an impact on IOPS.

For more detailed information and data, please refer to the whitepaper

vSphere 6.5 DRS Performance – A new white-paper

VMware recently announced the general availability of vSphere 6.5. Among the many new features in this release are some DRS specific ones like predictive DRS, and network-aware DRS. In vSphere 6.5, DRS also comes with a host of performance improvements like the all-new VM initial placement and the faster and more effective maintenance mode operation.

If you want to learn more about them, we published a new white-paper on the new features and performance improvements of DRS in vSphere 6.5. Here are some highlights from the paper:

 

65wp-blog-3

 

65wp-blog-2

Expandable Reservation for Resource Pools

One of the questions I was often asked about resource pools (RP) is ‘Expandable reservation’. What is expandable reservation, and why should I care about it? Although it sounds intuitive, it can be easily misunderstood.

To put it simply, a resource pool with ‘expandable reservation’ can expand its reservation by asking more resources from its parent .

The need to expand reservation comes from the increase in reservation demand of its child objects (VMs or resource pools). If the parent resource pool is short of resources, then the parent expands it reservation asking resources from the grand parent.

Let us try to understand this with a simple example. Consider the following RP hierarchy. If RP-4 has to expand its reservation, it requests resources from its parent RP-3 and if RP-3 has to expand resources it eventually requests Root-RP.

exp-res-7

 

 Resource pool with fixed reservation

A resource pool with fixed reservation cannot expand its reservation and any operation that needs reservation would fail incase of resource shortage.

Taking the above example, all the RPs have expandable reservation set. Then I changed the reservation of RP-4 from “Expandable” to a fixed value of 300MB as shown below.

exp-res-2-copy

Now the “Resource settings” for the RP-4 are,

exp-res-1-copy

We can see that, although the reservation used by all the VMs is zero, the RP-level used reservation is shown as 215MB. This reservation is coming from the VMs’ overhead memory (as computed by ESXi).

At this point I add two more VMs to the resource pool and powered-on one of them. Now the RP-level used reservation changes as shown below.

exp-res-3-copy

When I power-on another VM in the RP, it fails with error as shown below.

exp-res-4-copy

This happened because in this case, there wasn’t enough memory available to accommodate the new VM (its overhead memory needs to be reserved). Then, I change the reservation type back to “Expandable” and the power-on succeeds.

exp-res-6-copy

As we can see from the above figure, although the configured reservation shows 300MB, the used reservation is 321MB, which is the result of expandable reservation.

This is just one example of how expandable reservation can be useful. If the VMs inside the RP have their own reservations, then those will also be accounted for in the used reservation of the RP.

Hence, it is always advisable to keep RP reservation as “Expandable” as it can accommodate any increase in reservation demand by asking for more resources from the parent resource pool.

In the next post, we will look at how these reservations for a resource pool are different from the reservations for a VM and when to use them.

Latency Sensitive VMs and vSphere DRS

Some applications are inherently highly latency sensitive, and cannot afford long vMotion times. VMs running such applications are termed as being ‘Latency Sensitive’. These VMs consume resources very actively, so vMotion of such VMs is often a slow process. Such VMs require special care during cluster load balancing, due to their latency sensitivity.

You can tag a VM as latency sensitive, by setting the VM option through the vSphere web client as shown below (VM → Edit Settings → VM Options → Advanced)

edit-vmsettings-2
By default, the latency sensitivity value of a VM is set to ‘normal’. Changing it to ‘high’ will make the VM ‘Latency Sensitive’. There are other levels like ‘medium’ and ‘low’ which are experimental right now. Once the value is set to high, 100% of the VM configured memory should be reserved. It is also recommended to reserve 100% of its CPU. This white paper talks more about the VM latency sensitivity feature in vSphere.

DRS support

VMware vSphere DRS provides support for handling such special VMs. If a VM is part of a DRS cluster, tagging it as latency sensitive will create a VM-Host soft affinity rule. This will ensure that DRS will not move the VM unless it is absolutely necessary. For example, in scenarios where the cluster is over-utilized, all the soft rules will be dropped and VMs can be moved.

To showcase how this option works, we ran a simple experiment with a four host DRS cluster running a latency sensitive VM (10.156.231.165:VMZero-Latency-Sensitive-1) on one of its host (10.156.231.165)

cluster-load

As we can see from the screenshot, CPU usage of host ‘10.156.231.165’ is higher compared to the other hosts, and the cluster load is not balanced. So DRS migrates VMs from the highly utilised host (10.156.231.165) to distribute the load.

Since latency sensitive VM is a heavy consumer of resources, it will be the best possible candidate to migrate, as moving it will distribute the load in one shot. So DRS migrated the latency sensitive VM to a different host in order to distribute the load.

migrations-1

Then we put the cluster back in its original state, and set the VM latency sensitivity value to ‘high’ using VM options (as mentioned earlier). Also set 100% of memory and cpu reservations. This time, due to associated soft-affinity rule, DRS completely avoided the latency sensitive VM. It migrated other VMs from the same host to distribute the load.

migrations-2

Things to note:

  • 100% memory reservation for the latency sensitive VM is a must. Without the memory reservation, vMotion will fail; if the VM is powered-Off, it cannot be powered-On until reservation is set.
  • Since DRS uses a soft-affinity rule, sometimes the cluster might get imbalanced due to  these VMs.
  • If multiple VMs are latency sensitive, spread them across hosts and then tag them as latency sensitive. This will avoid over-utilization of hosts and results in better resource distribution.