Cloud Automation vRealize Operations

Capacity Management in SDDC – Part 9

If you land into this Part 9 directly, I’d recommend that you at least review from Part 5 first. If you want to review from the conceptual stage, then go to Part 1. This is a series of blog post on Capacity Management in SDDC.

Network (all tiers)

To recap, we need to create the following:

  1. A line chart showing the maximum network drop packets in the physical DC.
  2. A line chart showing the maximum and average ESXi vmnic utilization in the physical DC.

I use physical data center, not virtual data center. Can you guess why?

It’s easier to manage the network per physical data center. Unless your network is stretched, problems do not span across. Review this excellent article by Ivan, one of the networking industry authority.

The problem is how to choose ESXi from the same data center? It is possible for a physical data center to have multiple vCenter servers. On the other hand, it is also possible for vRealize Operations World object, or even a single vCenter, to span multiple physical data centers. So you need to manually determine the right object, so you get all the ESXi in that physical data center. For example, if you have 1 vRealize Operations managing 2 physical data centers, you definitely cannot use the World object. It will span across both data centers.

The screenshot below shows the super metric formula to get the maximum network drop packet at a vCenter data center object. Notice I use depth=3, as the data center object is 3 level above ESXi host object.

Drop Packet DC level

I did a preview of the super metric. As you can see above, it’s a flat line of 0. That’s what you should expect. No dropped packet at all from every host in your data center.

Dropped packet is much easier to track, as you expect 0 everywhere. Utilization is harder. If your ESXi has mix 10G and 1G vmnic, generally speaking you would expect the 10G to dominate the data. This is where consistent design & standard matter. Without it, you need to apply a different formula for different configuration of ESXi host.

Let’s look at the Maximum first, then Average. As I shared in this blog, you want to ensure that not a single vmnic is saturated. This means you need to track it at the vmnic level, not ESXi host level. Tracking at the ESXi Host level, as shown in the following screenshot, can hide the data at vmnic level. Take an example. Your ESXi has 8 x 1 Gb NIC. You are seeing a throughput of 4 Gbps. At the ESXi host level, it’s only 50% utilized. But that 4 Gbps is unlikely to be spread evenly. There is a potential at a vmnic is saturated, while others are hardly utilized.

ESXi vmnic utilization

As I shared in this blog, the super metric formula you need to copy-paste is

Max ([
Max(${this, metric=net:vmnic0|received_average}),
Max(${this, metric=net:vmnic0|transmitted_average}),
Max(${this, metric=net:vmnic1|received_average}),
Max(${this, metric=net:vmnic1|transmitted_average}),
Max(${this, metric=net:vmnic2|received_average}),
Max(${this, metric=net:vmnic2|transmitted_average}),
Max(${this, metric=net:vmnic3|received_average}),
Max(${this, metric=net:vmnic3|transmitted_average})
]) * 8 / 1024

The above is based on 4 vmnic per ESXi. If you have 2x 10 Gb, then you just need vmnic0 and vmnic1. If you have 6 vmnic, then you have to add vmnic4 and vmnic5.

The above will give you per ESXi host. You then need to apply it per physical data center.  Please review this blog post.

Ok, the above will get us the maximum. We then apply the same approach for average. The great thing about taking the average at individual vmnic is you do not have to worry about how many vmnics an ESXi host has. If you use the data at the ESXi Host level, as shown in the screenshot below, you need to divide the number by the number of vmnics.

ESXi vmnic utilization

Once you have the Maximum and Average, you want to ensure that the Maximum is not near your physical limit, and the Average is showing a healthy utilization. A number near the physical limit means you have a risk of capacity. A number with low utilization means you over provisioned the hardware.

BTW, there is 1 physical NIC that is not monitored in the above. Can you guess which one?

Yes, it’s the iLO NIC. That does not show up as vmnic. Good thing is generally there is very little traffic there, and certainly no data traffic.

This entry concludes our series in Capacity Management in SDDC.