Architecture

DRS Keeps VMs Happy

Since the introduction of Distributed Resource Scheduler (DRS) almost 10 years ago, it has become the most trusted way to ensure virtual machines are running at their peak performance. Over 80% of customers that have introduced DRS use it in fully automated mode, which allows for automatic placement and rebalance operations that simplifies capacity planning and administrative overhead.

There seems to be quite a bit of inaccurate information floating around recently about vSphere DRS. The most common thing I hear is that “DRS is focused on balancing hosts in the cluster, and is not focused on workload performance.”  Actually, nothing could be further from the truth, and hopefully this will help explain how DRS is working to keep your VMs performing optimally.

Workload Performance
Simply put, the main goal of DRS is to ensure that your VMs are getting the resources they are demanding in the most efficient way possible. vSphere clusters are comprised of separate hosts which ultimately provide the physical resources for the virtual machines. DRS will constantly evaluate the performance of the virtual machines and make a placement decision for which host within the cluster is best suited to meet this resource demand.

Placement Decisions
DRS is constantly looking at the CPU and memory demand from the virtual machines & hosts that provide these resources and then factors in over 25 various factors when making placement decisions from the following categories:

  • Estimated time to complete the vMotion
  • Estimated performance after the vMotion
  • VM resource demand
  • Host resource capacity
  • Resource reservations
  • Resource limits
  • VM/Host affinity/anti-affinity rules
  • Datastore connectivity
  • vSphere HA settings

One comment I hear is “my cluster is not balanced”.  Remember DRS is not designed to balance your vSphere hosts as its primary objective, and unlike other tools it will not move a VM just for the sake of moving it with little or no performance benefit.  If the host where the VM currently resides is capable of providing the resources, DRS will leave it alone.

On the other hand, a common situation occurs where an idle VM, or group of VMs, suddenly becomes active. At this point a VM may be demanding more resources than the host can provide, and now DRS will have some decisions to make. One decision it will make is whether it is worth moving. A vMotion operation is not done without incurring some overhead costs; CPU cycles, memory resources, network traffic, etc. If DRS determines that it would be more impactful to move the VM, or cause a performance impact to other VMs, it will decide to leave the VM as is.

However, there is more than one way to skin this cat. Chances are there are other VMs on this host that can move to another host with less overhead cost. If other virtual machines were to move to another host, it would free up resources for the busy VM. Perfect! So in some situations it’s best to move the active VM, but in other cases, it’s best to move the idle VMs away. But don’t worry about it. DRS weighs the pros and cons, and makes these intelligent placement decisions for you.

DRS Rules
DRS rules enable you to manipulate the placement for your virtual machines. These are manually created rules that force certain groups of VMs to be grouped together to the same host, separated across different hosts, or preferred/required to run on certain hosts. These three rules are known as:

  • Affinity Rules: Run a group of VMs on the same host
  • Anti-Affinity Rules: Run a group of VMs on separate hosts
  • VM/Host Rules: Force or restrict a group of VMs to a defined group of hosts

These rules can be helpful to increase performance of a multi-tiered application, increase availability, or reduce the license impact for certain applications.

Resource Pools
Resource Pools exist on stand-alone ESXi hosts. However, DRS expands this powerful feature across all the hosts within the vSphere cluster, creating a larger pool of logical capacity. This allows the creation of compute pools of resources that are used to apply resource limits, reservations, and shares across all the hosts within the cluster. This provides better usage of the collective pool of compute resources, while still providing a granular level of service for the virtual machines.

vSphere Interoperability
vSphere HA is the service that automatically restarts VMs after a host failure to surviving hosts within the same cluster. Its main job is to get virtual machines restarted as quickly as possible, but there are some situations in heavy utilized environments where one host may not be able to restart a VM from the failed host. This could be due to a reservation that is set on a VM. In this case DRS will defragment the VMs within the cluster to free up the resources needed to power on the VM.

Cross-vCenter and Long Distance vMotion was introduced in vSphere 6. DRS will preserve all the DRS rules and resource policies that have been applied to the virtual machines when they are moved into a new DRS cluster on another vCenter server. This is great because you don’t have to recreate any rule when performing these new types of  vMotion migrations.

Maintenance Mode
DRS can simplify routine maintenance operations by allowing fast evacuations of the hosts in preparation for updates. When a host is evacuated, all the DRS rules are honored as well to ensure the policies for VM placement is always met. Additionally, when vSphere Update Manager is used to update the vSphere hosts, rolling upgrades of an entire cluster is done automatically without any intervention from the vSphere administrators.

Power Efficiency
Distributed Power Management (DPM) is a feature included with DRS that delivers power savings by placing vSphere hosts in standby mode for periods the current resource demands from the virtual machines can be met by fewer hosts. When DPM is enabled, virtual machines will automatically be evacuated, so the host can be safely placed in standby mode. When the resource demand for the cluster rises, DPM will automatically take the hosts out of standby mode and rebalance the cluster to meet the current demand.

Balance Cluster Load
Although balancing the vSphere host utilization is not the main objective for DRS, it can be used to meet this requirement. This can be helpful to spread the number of VMs across the available hosts and prevent a scenario where “all your eggs are in one basket”. However, this may cause more frequent vMotions, and VMs might be placed on hosts where they could see little performance benefit.

Option A:
Move the migration threshold to 5-stars. The cluster balance threshold alters the target balance value, and moving the slider to a more aggressive value lowers this number. DRS will then determine the standard deviation of the CPU & MEM utilization across all the hosts within the cluster. If a host lands outside the target value determined by the migration threshold value, DRS will work to get all the hosts within the standard deviation.

Option B:
LimitVMsPerESXHostPercent – This advanced setting is designed to restrict the number of VMs that can run per host. This number is dynamically calculated based on the number of VMs, vSphere hosts, and the value that you define.

The maximum number of VMs that will run per host is calculated using the following formula:
(Number of VMs)/(Number of vSphere hosts) + (Number of VMs)/(Number of vSphere hosts) * (LimitVMsPerESXHostPercent)

Option C:
Set the MinGoodness and CostBenefit to 0.

  • MinGoodness – This value is used to ensure that the VM is moved only if it will resolve resource imbalance.
  • CostBenefit – This value is used to ensure that the VM is moved only if there is a resource benefit.

Setting these values to 0 will disable the DRS adaptive algorithm used to determine if moving a VM to another host will actually have any performance benefit for the virtual machine. Remember, the main objective for DRS is to make sure the VM resides on a host that providing the resources needed. Disabling these settings gives preference to a balanced cluster, even when DRS determines there may be no performance benefit by doing so. These settings are not recommended in environments where resources are constrained.

Summary
DRS is an instrumental feature when it comes to ensuring virtual machines are running optimally with the amount of physical resources available. It is constantly monitoring the performance from the virtual machine point of view to ensure that the host it resides is capable of meeting the resource demand. Other benefits include resource pooling, increased efficiency, reduced administrative overhead, and improved availability. These capabilities help deliver the most cost efficient and highly reliable infrastructure for your business applications.