Home > Blogs > VMware Cloud Management

How vRealize Suite can accelerate your career

The benefits experienced by deploying vRealize Suite are widely acknowledged. We decided to go the extra mile to get a better understanding of what our customers truly think of our product. We conducted focused interviews followed by an in-depth analysis. This gave us a pulse of what benefits our customers derive from these products and understand what we can do better for our customers. One of our customers said, “It’s huge for my personal career, obviously – skill set.” These focused interviews surface many positive experiences with the vRealize offerings – some of which are measurable and some more intangible.

Screen Shot 2015-06-16 at 9.52.15 AM

1

Delivering IT at the speed of business – vRealize Operations

 2Screen Shot 2015-06-15 at 3.52.29 PM

Quantifiable benefits 

 Screen Shot 2015-06-15 at 5.00.53 PM

 

VMware vRealize Operations delivers intelligent operations management across physical, virtual, and cloud infrastructures—from VMware vSphere and Hyper-V to Amazon Web Services.

Automate IT Processes – vRealize Automation

 

4Screen Shot 2015-06-15 at 3.58.34 PM

 Screen Shot 2015-06-15 at 5.07.22 PM

VMware vRealize Automation empowers IT to accelerate the delivery and ongoing management of personalized, business-relevant infrastructure, application and custom services while improving overall IT efficiency.

Complete Visibility into All of IT Cost – vRealize Business

 6Screen Shot 2015-06-15 at 4.00.20 PM

Screen Shot 2015-06-15 at 5.10.26 PM

 

VMware vRealize Business provides transparency and control over the cost and quality of IT services.

Benefits from connecting offerings

There is a multitude of benefits that unfold when more than one of our products is used in conjunction with others. For example, benefits that surface when vRealize operations is deployed in tandem with vRealize log insights. To give a better insight into the reality of connecting these various products, here’s what one of our customer’s who had three products deployed had to say: “The first steps were going to be, get [vR Biz] installed, integrate with vCenter Operations, and integrate with vRealize Automation to show the cost of a request at request time. [...] So when a user goes in there and says, I want this server, I need options, and it should tell them, well, it’s going to cost you this.” In their own words, customers connect vRealize products together.

 

vRealize Operations + vRealize Log Insight

7

 

 vRealize Business + vRealize Operations 

8

So there you have it folks, straight from the horse’s mouth! I hope that reading this article will convince you to deploy vRealize suite in your company and experience the difference. We value our customer engagement and would like to thank our customers for their feedback and support.

Capacity Management in SDDC – Part 9

If you land into this Part 9 directly, I'd recommend that you at least review from Part 5 first. If you want to review from the conceptual stage, then go to Part 1. This is a series of blog post on Capacity Management in SDDC.

Network (all tiers)

To recap, we need to create the following:

  1. A line chart showing the maximum network drop packets in the physical DC.
  2. A line chart showing the maximum and average ESXi vmnic utilization in the physical DC.

I use physical data center, not virtual data center. Can you guess why?

It's easier to manage the network per physical data center. Unless your network is stretched, problems do not span across. Review this excellent article by Ivan, one of the networking industry authority.

The problem is how to choose ESXi from the same data center? It is possible for a physical data center to have multiple vCenter servers. On the other hand, it is also possible for vRealize Operations World object, or even a single vCenter, to span multiple physical data centers. So you need to manually determine the right object, so you get all the ESXi in that physical data center. For example, if you have 1 vRealize Operations managing 2 physical data centers, you definitely cannot use the World object. It will span across both data centers.

The screenshot below shows the super metric formula to get the maximum network drop packet at a vCenter data center object. Notice I use depth=3, as the data center object is 3 level above ESXi host object.

Drop Packet DC level

I did a preview of the super metric. As you can see above, it's a flat line of 0. That's what you should expect. No dropped packet at all from every host in your data center.

Dropped packet is much easier to track, as you expect 0 everywhere. Utilization is harder. If your ESXi has mix 10G and 1G vmnic, generally speaking you would expect the 10G to dominate the data. This is where consistent design & standard matter. Without it, you need to apply a different formula for different configuration of ESXi host.

Let's look at the Maximum first, then Average. As I shared in this blog, you want to ensure that not a single vmnic is saturated. This means you need to track it at the vmnic level, not ESXi host level. Tracking at the ESXi Host level, as shown in the following screenshot, can hide the data at vmnic level. Take an example. Your ESXi has 8 x 1 Gb NIC. You are seeing a throughput of 4 Gbps. At the ESXi host level, it's only 50% utilized. But that 4 Gbps is unlikely to be spread evenly. There is a potential at a vmnic is saturated, while others are hardly utilized.

ESXi vmnic utilization

As I shared in this blog, the super metric formula you need to copy-paste is

Max ([
Max(${this, metric=net:vmnic0|received_average}),
Max(${this, metric=net:vmnic0|transmitted_average}),
Max(${this, metric=net:vmnic1|received_average}),
Max(${this, metric=net:vmnic1|transmitted_average}),
Max(${this, metric=net:vmnic2|received_average}),
Max(${this, metric=net:vmnic2|transmitted_average}),
Max(${this, metric=net:vmnic3|received_average}),
Max(${this, metric=net:vmnic3|transmitted_average})
]) * 8 / 1024

The above is based on 4 vmnic per ESXi. If you have 2x 10 Gb, then you just need vmnic0 and vmnic1. If you have 6 vmnic, then you have to add vmnic4 and vmnic5.

The above will give you per ESXi host. You then need to apply it per physical data center.  Please review this blog post.

Ok, the above will get us the maximum. We then apply the same approach for average. The great thing about taking the average at individual vmnic is you do not have to worry about how many vmnics an ESXi host has. If you use the data at the ESXi Host level, as shown in the screenshot below, you need to divide the number by the number of vmnics.

ESXi vmnic utilization

Once you have the Maximum and Average, you want to ensure that the Maximum is not near your physical limit, and the Average is showing a healthy utilization. A number near the physical limit means you have a risk of capacity. A number with low utilization means you over provisioned the hardware.

BTW, there is 1 physical NIC that is not monitored in the above. Can you guess which one?

Yes, it's the iLO NIC. That does not show up as vmnic. Good thing is generally there is very little traffic there, and certainly no data traffic.

This entry concludes our series in Capacity Management in SDDC.

3 Reasons Why You Should Integrate Log Insight with vRealize Operations

In my last post, I demonstrated how easy it is to integrate Log Insight (LI) with vRealize Operations (vR Ops). In this post, I will talk about the value of integrating Log Insight with vR Ops.

vrops-256

Continue reading

Guided Remediation with vRealize Operations

Written by Chima Njaka, vRealize Operations Product Line Manager

Like many in the SF Bay Area, I’ve been caught up in the epic NBA Final championship series between the local Golden State Warriors and the Cleveland Cavaliers.  As of this writing, 3 exciting games of the best-of-seven series have been completed, with the first two going to overtime, and wins by both teams.  Although the Warriors struck first with an early win, the Cavs have responded by winning the last two games, putting together a smothering defense that has kept to Warriors to their lowest scoring percentage all season, while staying just enough in front with their own offense.

Guided Remediation by Andre and Steph.

Although it is just a game, there is something that we can learn from watching the dynamics of this championship. That is: when faced with obstacles to progress, you must make adjustments to remediate the situation.  We’ve seen how the Cavs adjusted to the Warriors after game 1.  How should the Warriors now adjust?  No doubt, anxious Warriors fans have all sorts of recommendations.  But it is ultimately the team’s coach who is responsible for wisely guiding them back to victory.

A Coach for Your Data Center: Guided Remediation

So, what if you had a coach to assist you with your data center operations? What would your expectations be? Certainly, you’d hope it to draw on it’s experience, analytics capabilities, and insight to guide you in optimally executing your day to day plans, and respond to any issues, with clear actions to change things for the better.

When you think about it, this sounds exactly like VMware’s vRealize Operations Manager! Not only does it offer best-in-class visibility into your heterogeneous virtual environment, including immediate health issues, potential capacity shortfalls, and opportunities to optimize resources, but it also takes things to the next level — guiding you to directly remediate identified operational issues from the same interface. This takes advantage of the new Actions Framework introduced with vROPs 6.0. In fact, as an expert “player” in your own environment, you can create your own Symptoms, Alerts, Recommendations, and associated Remediation Actions.

In a way, the Actions Framework and the Actions it provides for you are like your teammates on the court, ready to execute the coaches (and your) plan. So, team up with vRealize Operations Manager and let its guided remediation capabilities help you tame your daily operational issues!

Oh yeah, and Go Warriors!

 

 

Capacity Management in SDDC - Part 7

In Part 5, I explained a new concept, where we use Contention as the basis of Capacity Management in SDDC. In Part 6, I provided the super metric equation for each charts. In this part, I will provide example of the super metric formula and dashboard screenshots.

Tier 1 (Highest)

To recap, we need to create line charts showing the following:

  1. The total number of vCPU left in the cluster.
  2. The total number of vRAM left in the cluster.
  3. Total number of VM left in the cluster.
  4. The maximum and average storage latency experience by any VM in the cluster
  5. Disk capacity left in the datastore cluster.

The screenshot below shows the super metric formula to get the total number of vCPU left in the cluster.

Tier 1 - No of vCPU left in a cluster after HA

Copy-paste the formula below:

${this, metric=cpu|alloc|actual.capacity} *
(

( ${this, metric=summary|number_running_hosts} - 1 ) /
${this, metric=summary|number_running_hosts}
)
- ${this, metric=summary|number_running_vcpus}

In logic, the formula is Supply - Demand, where:

  • Supply = No of Physical Cores in Cluster x ((No of Hosts - 1) / No of Hosts)
  • Demand = No of running vCPU in cluster

I have to assume there is 1 HA host in the cluster. If you have 2, replace 1 with 2 in the formula above.

I have to calculate the supply manually as vRealize Operations does not have a metric for No of Hosts - HA. Actually, it does, but the metric cannot be enabled.

If you find the formula complex, you can actually split them into 2 super metrics first. Work out Supply, then work out Demand. Let me use the RAM as example.

The screenshot below shows the super metric formula to get the total RAM supply. It is the total RAM in the cluster, after we take into account HA. I have to divide the number by 1024, then again by 2014, to convert from KB to GB.

Notice I always preview it. It's important to build the habit of always verifying that your formula is correct.

Tier 1 - Total physical RAM capacity in a cluster after HA

Once the Supply side is done, I worked on the Demand side. The following screenshot shows the demand.

Tier 1 - Total VM vRAM configured in a cluster

Once I verified that both are correct, it's a matter of combining them together.

Tier 1 - Total vRAM left in a cluster after HA

You can copy paste the formula below:

(
${this, metric=mem|alloc|actual.capacity} /1024 /1024 *
(
(
${this, metric=summary|number_running_hosts} - 1 ) /
${this, metric=summary|number_running_hosts} )
) -
(

Sum (${adapterkind=VMWARE, resourcekind=VirtualMachine, attribute=mem|guest_provisioned, depth=2}) /
1024/1024)

The screenshot below shows the super metric formula to get the total number of VM left in the cluster. I have to hardcode the maximum number that I allowed.

No of VM left in the cluster

The screenshot below shows the super metric formula to get the Maximum latency of all the VMs in the cluster. I've chosen at Virtual Disk level, so it does not matter whether it is VMFS, VMFS, NFS or VSAN.

To create the Average latency super metric, you just need to replace the string Max with Avg in the formula.

super metric - vDisk

You can copy paste the formula below:

Max ( ${adapterkind=VMWARE, resourcekind=VirtualMachine, attribute=virtualDisk|totalLatency, depth=2 } )

The screenshot below shows the super metric formula to get the total number of disk capacity left in the cluster. This is based on Thin Provisioning consumption.

You can copy paste the formula below:

sum( ${adapterkind=VMWARE, resourcekind=Datastore, attribute=capacity|available_space, depth=1} )

For Thick Provision, use the following super metric:

super metric - Disk - space left in datastore cluster - thick

You can copy paste the formula below:

Sum
(
${adapterkind=VMWARE, resourcekind=Datastore, attribute=capacity|total_capacity, depth=1}
) -
Sum
(
${adapterkind=VMWARE, resourcekind=Datastore, attribute=capacity|consumer_provisioned, depth=1}
)

Last but not least, do not forget to include buffer for snapshot. This can be 20%, depending on your environment.

I hope you find the article useful for Capacity Management in SDDC. In part 8 (scheduled later this month), I will cover the super metrics for Tier 2 & 3, and for Network.

Using vRealize Operations Manager to Monitor the Cluster Nodes

In the past, I have written about the Architecture of vRealize Operations Manager which allows you to have Master and Replica nodes in a vROps Cluster. This not only allows you to distribute the adapters or solutions to more than one collectors present on each host, but also gives you resiliency in case the Master Node in the cluster fails.
With this post, I will actually share a failure which has been seen in my lab because of a couple of services failing on the master node. This resulted in a fail-over and the Replica switched over to become the Master Server. All this is possible because just like the previous releases, each and every service on your vROps nodes in being monitored by vROps itself.
Let me give you a brief description of my lab before I begin:
As you can see in the screenshot from my lab, I have a Master Node and a Replica Node. Since the past few days, when I tried accessing my vROps product UI through the Master Node IP address, it gave a Page Cannot Be Displayed. I immediately switched over to the other IP to see if I was able to access. As per the product behaviour, I can access the solution through any node of the cluster and I was able to achieve that without issues.
Today, I thought of looking at the issue with my Master Node and all I had to do was to click on a RED box on the recommendations page. Let me share that through a screenshot:
You can see that I have a red object on the Heat Weather Map and if I look down, I can immediately see an Alert for the the Master Node about services being down and 2 recommendations. Let us see what are the recommendations by clicking on this Alert:
Here, you can see that the Node Processing and Collector services are down and hence we are getting 2 recommendations to resolve this issue. One is to take the node offline and then bring it back online. The other option is to visit VMware Support. We could have also reached this screen, or could have directly jumped on a screen which monitors the entire vROps cluster by clicking on:
Home -> Environment -> vRealize Operations Cluster 
 
Here we can expand the cluster and see all the nodes and services associated with each node. Let us see this in a screenshot:
Here you can look at all the services and their health individually. Instead of going through these services, we will try to follow the recommendation given by the tool to make the affected node offline and then back online. Let us go into the cluster management and see the current state of the cluster:
Click on Home -> Administration -> Cluster Management 
Now you can see that the Replica Node has become a Master Node and vice-versa. We can select the VROPS-M node here and bring it offline.
cluster
As soon as I tried to take the host offline, I got an error that the Operation has failed and I should contact VMware support. Since, it's just a lab I will go ahead and restart the VROPS-M node from the vCenter Server and after a few minutes I was able to login to the Master Node IP address. Once I login into the Product UI, I can see that the resources have been distributed between both the clusters and the data gathering has started to work again. One thing to notice is that, after a failure the Master and Replica have switched over roles.
cluster
Voila, the issue seems to be fixed and I have all the nodes in a working state with objects and Metrics both shared between the nodes for a faster collection. If you go to the main recommendations page from where it all started, you would notice that that box will now turn green and you are good to go as the entire solution is up & running.
Hope this helps you in configuring your vROps clusters with confidence and a complete understanding of how the clusters are monitored and fail-over process works.

ACI Specialty Benefits: vSOM Customer Spotlight

We are proud to shine the spotlight this week on a VMware customer success story. Here’s proof that even smaller organizations can benefit greatly by upgrading from a naked vSphere environment to vSphere with Operations Management. This is especially true for a company like ACI Specialty Benefits in San Diego, an employee and student benefits company that is growing exponentially and needed its IT infrastructure to proactively support the business.

The ACI Specialty Benefits Story

Facing hyper-growth, ACI Specialty Benefits needed to ensure the company’s infrastructure was poised to proactively support the business. ACI moved to a virtualized data center and deployed vSphere with Operations Management in order to effectively load test, ensure adequate resources to handle demand, and onboard new customers more quickly and effectively. The company can now run applications at high service levels and maximize hardware savings through 40% higher capacity utilization and 50% higher consolidation ratios.

With the added visibility throughout its IT infrastructure, ACI has seen a 25% decrease in time spent on diagnostics and problem resolution. As a result of the operational efficiencies, the IT team is freed up to take on additional strategic initiatives.

“The first time looking at that single pane of glass was very surprising for us to see all of our different VMware infrastructure – at the data center, at headquarters, everywhere. Being able to see where we stood was a game changer.”

— Ryan Fay, Chief Information Officer, ACI Specialty Benefits

To hear the folks at ACI Specialty Benefits talk about how they use vSphere with Operations Management in their own words, check out the video, or download the info graphic here.

About ACI Specialty Benefits

Covering over seven million people, ACI Specialty Benefits is one of the nation’s top ten providers of Employee Assistance Programs. The company also offers premiere Workplace Wellness, Concierge, and Student Assistance programs. With a 95% customer retention rate, ACI is known for anticipating needs, exceeding expectations and providing customers with an unparalleled professional partnership.

Try the vSphere Optimization Assessment, and see what ACI sees

The vSphere Optimization Assessment (VOA) is a simple and powerful health check for today's virtual environments and addresses a number of key challenges for IT organizations. This free assessment is now available to customers on vmware.com as a 30-day program designed to enable them to:

  • Download and install the trial software of vRealize Operations – the predictive analytics engine in vSphere with Operations Management (vSOM)
  • Access 4 VOA Reports from within the installed product
  • Get advice from trained sales professionals

Download VOA

 

Management Pack for Storage Devices (now with VSAN): Beta

We're pleased to announce signups for the beta program for the  vRealize Operations Management Pack for Storage Devices (MPSD), as reported on the VMware Storage Blog. This iterations of the program includes features to manage the latest version of Virtual SAN (VSAN). We're looking for folks who'd like to test out the latest iteration of our MPSD. The 6.0 version can be viewed here.

If you'd like to join this beta, sign up herehttp://eepurl.com/bcln3r

About the  vRealize Operations Management Pack for Storage Devices (MPSD)

The vRealize Operations Management Pack for Storage Devices 6.0.1 provides visibility into your storage environment. Using Common Protocols you can collect performance and health data from the storage devices. Pre-defined dashboards allow you to follow the path from a VM to the storage volume and identify any problem that may exist along that path.

Highlights
• End to End view of the data path through the SAN and NAS; from VM to Storage Volume
• Support for both NFS/iSCSI and FC/FCoE protocols
• Access to Storage devices leveraging standardized protocols; CIM, SMI-S, & VASA
• Ready to use dashboards for Health and Performance
• Analytics for common All Paths Down and PDL storage conditions
• This release has Beta support for VirtualSAN. (Sign up here  http://eepurl.com/bcln3r)

Requirements for the MPSD Beta

In order to participate in this beta, you need 1) a working knowledge of vRealize Operations 2) an installation, or a willingness to install, Virtual SAN. 3) a non-trivial amount of storage to test this against.

About vRealize Operations Management

vRealize Operations Management is a cloud operations management system that delivers intelligent IT operations management from applications to storage--for vSphere, Hyper-V, Amazon and physical hardware--with predictive analytics and policy-based automation. it built on a scale-out, resilient platform designed to deliver intelligent operational insights to simplify and automate management of applications and infrastructure across virtual, physical and cloud environments. For more information, see: https://www.vmware.com/products/vrealize-operations .

About Virtual SAN

VMware Virtual SAN is software-defined storage for VMware vSphere. By clustering server-attached hard disks and/or solid state drives (HDDs and/or SSDs), Virtual SAN creates a flash-optimized, highly resilient shared datastore designed for virtual environments.

Download VOA

IPAM Automation for Cloud

by Rich Bourdeau

vCAC-InfoBlox Logo

Many companies use InfoBlox IP address Management (IPAM) to manage their IP addresses and DNS host records.  InfoBlox has recently updated their  vRealize Automation plug-in that allows IP Addresses and DNS configurations  to be automatically assigned  as part of the automated provisioning a new machine or application.    This integration has simplified and accelerated the end to end provisioning, and lifecycle management of both infrastructure and applications

What You Will Learn

  • Discover how VMware is the foundation for the Software Defined Enterprise
  • Learn how the consolidated management of VMware automates deployment of secure, scalable, high performing multi-tier application
  • Examine how Infoblox VMware adapter can integrate and  with VMware automation and workflows to augment IPAM and DNS services while providing greater visibility of networking resources in your cloud environment
  • Watch a demonstration of Infoblox and VMware vRealize Automation to deploy servers in a cloud environment

Date: Wednesday June 17, 2015
Time: 11am PDT (2pm EDT)

Register Now


If you are unable to attend, 
click here to register for access to the archived recording.

Need help deploying your private cloud infrastructure or developing your business justification? Contact us and our experts can help your team build the business case and the solution that will maximize your IT productivity.

For exclusive content and updates, follow us on Twitter @vmwarecloudauto and subscribe to our VMware IT Management blog.

 

 

Capacity Management in SDDC - Part 6

In Part 5, I explained a new concept, where we use Contention as the basis of Capacity Management in SDDC. In this part 6, I will now provide the super metric formula for each charts. We will cover Tier 1, followed by Tier 2 and 3.

Tier 1 (Highest)

To recap, we do not have over-subscription in Tier 1. We only have it in Tier 2 and 3. As a result, it becomes simpler, as we are following Allocation model essentially.

You should be performing capacity planning at Cluster level, not Data Center or Host level.

Compute: CPU

Supply: Total physical cores of all ESXi Hosts - HA buffer

  • We can choose physical Core or physical Threads. One will be conservative, while the other aggressive. Ideal number is 1.5 of physical core. My recommendation: take the core, not the Threads. This is because it is Tier 1, your highest & best tier.
  • Threshold: 10% of your capacity, as it takes time to buy cluster (which also needs storage). You are also not aiming to run your ESXi at 100% utilization.
  • We do not have to build your threshold (which is your buffer actually) into the super metric formula as it’s dynamic. Once it’s hard coded in the super metric, changing it does not change the history. It is dynamic because it depends on business situation. If there is a large project going live in a few weeks, then your buffer needs to cater for it. This is why we need to stay close to the business. It’s also something you should know, based on your actual experience in your company. You have that gut feel and estimate.

Demand: Total vCPU for all the VMs.

  • If we are using virtual threads in your VM, then count them as if they are a full vCPU. For example, a VM with 2 vCPU and 2 threads per core should be counted as 4 vCPU.

Super Metric Formula: Supply - Demand

Compute: RAM

Supply: Total physical RAM of all ESXi Hosts – HA buffer

  • No need to include ESXi vmkernel RAM as it’s negligible. If you are using VSAN & NSX, you can add some buffer. You do not need to include virtual appliance as they take the form of a VM, hence it will be included in the Demand.
  • Threshold: set the name number, which is 10% in this example.

Demand: Total vRAM for all the VMs

Super Metric Formula: Supply - Demand

Compute: VM

Super Metric Formula: Max no of allowed VM in 1 cluster - No of VM in the cluster

  • I apply my Availability policy at cluster level since I think it's makes more sense. Applying at ESXi Host level is less applicable due to HA. Yes, the chance of a host going down is higher than entire cluster going down. However, HA will reboot the VMs, and VM owners may not notice. On the other hand, if a cluster goes down, it's a major issue.
  • The limitation of this super metric is it assumes your cluster size may vary. This is a fair assumption. You should keep things consistent. If for some reasons you have say 3 cluster sizes (e.g. 8, 10, 12), then you have 3 super metrics.

Compute: Summary

Look at the above 3 charts as 1 group. Take the one with the lowest number.

In emergency, temporary solution, you can still deploy VM while waiting for your new cluster to arrive. This is because you have HA buffer. ESXi host is known for its high uptime.

Storage

We have to measure both IOPS and Space. Take the lowest of these 2 dimensions, because adding one capacity gives you the other. This also keeps your storage in simple building block.

  • For IOPS, it is simpler. Just take the maximum and average latency. If the maximum is nearing your SLA, you need to buy more capacity. You can take the Maximum at Cluster level, or Datastore Cluster level.
  • For Space, it is more complex. Below is the formula for Space.

Supply: Total datastore space capacity in the cluster

You should be using Datastore Cluster. Other than the benefits that you get from using it, it also makes capacity management easier. If you are using it, you need not manually exclude local datastore. You also need not manually group the shared datastores, which can be complex if you have multiple clusters.

With VSAN, you only have 1 datastore per cluster and need not exclude local datastores manually. This means it's much simpler in VSAN.

Include buffer for snapshot. This can be 20%, depending on your environment. This is why I'm not a fan of many small datastores, as you have pockets of unusable capacity. This does not have to be hardcoded in your super metric, but you have to be mentally aware of it. If you need a visual reminder, chapter 8 of my book has a heat map sample to track it.

Storage space should be tied with your actual, physical capacity. If you are doing thin provisioning at the storage layer, then you need to measure it at this level. I prefer to use thin on VMware, and thick on physical array.

Demand: Total Storage consumed by all VMs in the shared datastore

The Total vDisk depends if you are doing thin provisioning or not.

If you are not, then it is simple. Just total all the storage consumed by all VMs.

If you are, you will have 2 numbers. One for Configured and one for Utilized. The number you need is somewhere in between. You need to make a business call where you want to take it, as it depends on your environment. If the disk growth is relatively modest, then you can take closer to Utilized. If not, take closer to Configured.

Super Metric Formula: Supply - Demand

The above will give you the information you need for Tier 1. Tier 2 and 3 will be different, as there is over-subscription. This means we cannot ignore contention.

Tier 2 and 3 (lowest)

Compute: Summary

Super Metric Formula: Maximum (VM CPU Contention) in the cluster

Super Metric Formula: Average (VM CPU Contention) in the cluster

Super Metric Formula: Maximum (VM RAM Contention) in the cluster

Super Metric Formula: Average (VM RAM Contention) in the cluster

For the total number of VM left in the cluster, see Tier 1. It's the same formula, just a different policy.

Storage

See Tier 1. It's the same set of super metrics, just a different policy.

Network (all tiers)

Super Metric Formula: Max (VM Network Drop Packets) at the physical DC level

You should expect 0 drop packet in entire data center.

Super Metric Formula: Max (ESXi Host vmnic utilization) in the physical DC.

This number has to be below your physical capacity. Ideally, it has buffer so it can handle spike from network intensive events.

Super Metric Formula: Average (ESXi Host vmnic utilization) in the physical DC.

Conclusion

Indeed, a few line charts is all you need to do capacity management in SDDC. I am aware it is not a fully automated solution. However, my customers found it logical and ease to understand. It is following an 80/20 principle, where you are given the 20% room to make the judgement call as the expert.

We will cover the actual super metric examples, in part 7, scheduled for publication this month.