Scale on Demand Technical VMware Cloud on AWS

Elastic DRS for VMware Cloud on AWS

Elastic DRS lets you scale your VMware Cloud on AWS cluster according to demand by adding or removing hosts automatically based on specific policies. This feature further extends the availability and resiliency of the SDDC cluster and removes the infrastructure operations burden from the customer.

td, th {
border: 1px solid #999; border-color: black;
padding: 6px;
}

th {
background-color:#4a91da; color: white; !important
}
tr:nth-child(even) {
background-color: #f2f2f2; !important
}

table {
border-collapse: collapse;
}

One of the great things about the VMware Cloud on AWS service is that it’s operated and managed by VMware, taking the infrastructure operations burden away from the customer. VMware does this by not only managing the hardware lifecycle and remediation, but also through managing certain aspects of the vSphere environment such as Availability and DRS. One feature that further extends availability and resiliency of the SDDC cluster, which is exclusive to VMware Cloud on AWS, is Elastic DRS (eDRS).

Elastic DRS allows you to scale your cluster in response to demand, or lack of demand, by adding or removing hosts automatically based on specific policies that are configured. The eDRS algorithm runs every 5 minutes and looks at predefined resource thresholds for CPU, memory, and storage. The thresholds cannot be changed by the user and differ based on the policy configured. While the algorithm runs every 5 minutes, the scaling decisions also take into account trends that are tracked over time. If ANY of the resources consistently remain above the defined threshold, a scale-up recommendation alert is generated, and a host is added to the cluster. Conversely, a scale-down recommendation alert is only generated when ALL resources are consistently below the threshold, triggering the removal of a host.


By default, the Scale Up for Storage Only policy is now configured for every cluster deployed within your SDDC. Previously, customers were simply advised to maintain at least 30% slack space in their SDDCs, but this is now being enforced. The maximum usable capacity of your vSAN datastore is 75%; when you reach that threshold, eDRS will automatically start the process of adding a host to your cluster and expanding your vSAN datastore. Please note that even if you free up enough storage to fall below the threshold, the cluster will not scale-down automatically. You will need to manually remove host(s) from the cluster.

he other policies available include Optimize for Best Performance and Optimize for Lowest Cost. In these scenarios, the eDRS algorithm will look at the minimum and maximum hosts you’ve specified for your cluster size and take that into consideration with resource consumption. Optimizing for performance adds hosts quickly and removes them slowly to ensure the best possible performance; while optimizing for lowest cost removes hosts quickly and adds hosts slower to keep costs to a minimum.

The resource thresholds differ based on the policy you configure.

Performance Policy

Resource Threshold
CPU High: 90%, Low: 50%
Memory High: 80%, Low: 50%
Storage High: 70%, Low: 20%

 

Cost Policy

Resource Threshold
CPU High: 90%, Low: 60%
Memory High: 80%, Low: 60%
Storage High: 70%, Low: 20%

 

Safety Checks and Notifications

There is a safety check built-in, so we aren’t continuously adding or removing hosts; we want the cluster to “cool off” and the resources to level out. There is a 30-minute delay between scale-up events, and a 3-hour delay to trigger a scale-down event after a scale-up event.

When scaling recommendations are generated, the multi-channel notification service will send out automated notifications via email to organization members.


And via the console:


Information is also tracked in the Activity Log:


Lastly, more detailed tasks are tracked within the web client:


As you can see, there’s certainly no shortage of notifications when it comes to scaling your clusters. Customers can also subscribe to the notification webhook for the events.

In the end, you have the scalability and flexibility you expect from a Cloud service to maintain availability, capacity, and performance.

Resources

For other information related to VMware Cloud on AWS, here are some more learning resources for you: