Automating VMware Cloud Foundation: Auto-scale Workload Domain Clusters

In part three of our Automating VMware Cloud Foundation Series, we show you how to auto-scale workload domain clusters using vRealize Operations (vROps) to monitor your environment.

For Automating VMware Cloud Foundation (VCF) use cases, we wanted to ensure that resource constraints didn’t cause outages to our workload environments due to lack of resources like CPU and Memory. Therefore, for the Auto-scale Workload Domain Clusters use case, we leveraged vRealize Operations (vROps) to monitor the environment and to trigger an alert when capacity hits a specified threshold. The alert, in turn, invokes a vRealize Orchestrator (vRO) workflow which adds an available host from VCF inventory to the Workload Domain Cluster.

Auto-scale, in the context of modern apps and infrastructure, is becoming a common need for modern data centers and operations. Self-healing or proactive remediation features have become a common practice across the IT industry today. Auto-scale functionality is not natively available in VCF today, but by leveraging the VMware management products, we can achieve the desired outcome.

As you have seen in the previous blog “Automating VMware Cloud Foundation: Creating a Workload Domain”, automation makes a domain creation task much easier for end users. Now that the Workload Domain is created, we want to proactively scale out the cluster by adding an additional host from the VCF inventory when the Workload Domain Cluster becomes capacity constrained.

Let’s first review the number of hosts in the Workload domain, so we can understand the impact after the auto-scale out. As you see there are three hosts in WLD02.

As you see from the image illustrated above, WLD02 has 3 hosts. When the auto-scale capability is triggered via a vRealize Operations (vROps) alert you will see additional hosts added into the configuration. Since my lab is very small, I can induce the capacity stress by adding a VM into the cluster and this should trigger an alert in vRealize Operations.

Picture shown below illustrates the active alert on vRealize Operations when the WLD02 Cluster “Capacity remaining drops to less than or equal to 10 %”.

When the alert goes active and you login to vRealize Orchestrator Web Client, you should see “VR20 Add host to WLD Cluster” showing up in “Recent Workflow Runs” visible in vRealize Orchestrator dashboard as shown in the image below:

When vRealize Orchestrator finishes executing the workflow, it would have added a host in the affected VCF cluster remediating the cluster stress.

Alert shown in the image below is a custom alert I defined for VCF. This alert uses vRealize Orchestrator actions to call “VR20 Add the host WLD Cluster” workflow when the alert goes active. The screenshot below shows how the custom alert configuration looks in my lab:

The symptom definition in this custom VCF alert uses a custom super metric I created for monitoring the “capacity remaining”. By default, you cannot call a “Analytics generated – Capacity Remaining Metric” and to work around this I created a super metric that will copy the “Capacity Remaining metric” as shown in the image below:

This alert is then tied to a custom vRealize Operations policy WLD02 as you see in the image below:

The benefit of tying it to a custom policy gives customers the ability to define the monitoring intent for each domain separately if you wish to do so. This is very beneficial if you are a service provider and each customer may have different monitoring intents and monitoring goals.

Key benefit of this automation capability is:

Auto Remediation
- Remediate VCF WLD cluster stress by adding a host automatically
Uptime
- Keep workloads running by avoiding outages caused due to resource contention
Policy Driven
- We can define the monitoring objective per VCF domain thus giving more granularity and control on what you think is critical for each customer

I hope by now that you are getting a better view on how automating VCF is going to benefit our customers.

The following prerequisites are assumed to be available in the environment:

A vRealize Operations custom group needs to be defined for each Workload domain cluster that wants to leverage auto-scale capability
Associate a custom policy for each “custom group” if you want monitoring goals should be different for different workload domains
Enable “VCF Custom alert” in the custom policy by setting it to “Local” under “Automate”
Download and install vRealize Orchestrator Management pack 3.1
Add vRealize Orchestrator into “Other Accounts”
Import vRO packages that have the VCF workflows

Next up in the “VMware Cloud Foundation Automation series of blogs” we look at “Deleting a VCF VI Workload Domain”.

Related Articles

New Feature Announcement of Oracle Cloud VMware Solution at VMware Explore Las Vegas 2023

Oracle Cloud VMware Solution - Spring Release

Highlights of Alibaba Cloud VMware Service Sessions at VMware Explore China

What's New with Oracle Cloud VMware Solution at VMware Explore 2022 (US)

Discover Oracle Cloud VMware Solution Sessions at VMware Explore 2022 (US)