Cloud Operations vRealize Operations

Capacity Management in SDDC – Part 7

In Part 5, I explained a new concept, where we use Contention as the basis of Capacity Management in SDDC. In Part 6, I provided the super metric equation for each charts. In this part, I will provide example of the super metric formula and dashboard screenshots.

Tier 1 (Highest)

To recap, we need to create line charts showing the following:

  1. The total number of vCPU left in the cluster.
  2. The total number of vRAM left in the cluster.
  3. Total number of VM left in the cluster.
  4. The maximum and average storage latency experience by any VM in the cluster
  5. Disk capacity left in the datastore cluster.

The screenshot below shows the super metric formula to get the total number of vCPU left in the cluster.

Tier 1 - No of vCPU left in a cluster after HA

Copy-paste the formula below:

${this, metric=cpu|alloc|actual.capacity} *
(

( ${this, metric=summary|number_running_hosts} – 1 ) /
${this, metric=summary|number_running_hosts}
)
– ${this, metric=summary|number_running_vcpus}

In logic, the formula is Supply – Demand, where:

  • Supply = No of Physical Cores in Cluster x ((No of Hosts – 1) / No of Hosts)
  • Demand = No of running vCPU in cluster

I have to assume there is 1 HA host in the cluster. If you have 2, replace 1 with 2 in the formula above.

I have to calculate the supply manually as vRealize Operations does not have a metric for No of Hosts – HA. Actually, it does, but the metric cannot be enabled.

If you find the formula complex, you can actually split them into 2 super metrics first. Work out Supply, then work out Demand. Let me use the RAM as example.

The screenshot below shows the super metric formula to get the total RAM supply. It is the total RAM in the cluster, after we take into account HA. I have to divide the number by 1024, then again by 2014, to convert from KB to GB.

Notice I always preview it. It’s important to build the habit of always verifying that your formula is correct.

Tier 1 - Total physical RAM capacity in a cluster after HA

Once the Supply side is done, I worked on the Demand side. The following screenshot shows the demand.

Tier 1 - Total VM vRAM configured in a cluster

Once I verified that both are correct, it’s a matter of combining them together.

Tier 1 - Total vRAM left in a cluster after HA

You can copy paste the formula below:

(
${this, metric=mem|alloc|actual.capacity} /1024 /1024 *
(
(
${this, metric=summary|number_running_hosts} – 1 ) /
${this, metric=summary|number_running_hosts} )
) –
(

Sum (${adapterkind=VMWARE, resourcekind=VirtualMachine, attribute=mem|guest_provisioned, depth=2}) /
1024/1024)

The screenshot below shows the super metric formula to get the total number of VM left in the cluster. I have to hardcode the maximum number that I allowed.

No of VM left in the cluster

The screenshot below shows the super metric formula to get the Maximum latency of all the VMs in the cluster. I’ve chosen at Virtual Disk level, so it does not matter whether it is VMFS, VMFS, NFS or VSAN.

To create the Average latency super metric, you just need to replace the string Max with Avg in the formula.

super metric - vDisk

You can copy paste the formula below:

Max ( ${adapterkind=VMWARE, resourcekind=VirtualMachine, attribute=virtualDisk|totalLatency, depth=2 } )

The screenshot below shows the super metric formula to get the total number of disk capacity left in the cluster. This is based on Thin Provisioning consumption.

You can copy paste the formula below:

sum( ${adapterkind=VMWARE, resourcekind=Datastore, attribute=capacity|available_space, depth=1} )

For Thick Provision, use the following super metric:

super metric - Disk - space left in datastore cluster - thick

You can copy paste the formula below:

Sum
(
${adapterkind=VMWARE, resourcekind=Datastore, attribute=capacity|total_capacity, depth=1}
) –
Sum
(
${adapterkind=VMWARE, resourcekind=Datastore, attribute=capacity|consumer_provisioned, depth=1}
)

Last but not least, do not forget to include buffer for snapshot. This can be 20%, depending on your environment.

I hope you find the article useful for Capacity Management in SDDC. In part 8 (scheduled later this month), I will cover the super metrics for Tier 2 & 3, and for Network.