In the previous post, I covered capacity management at VM level. In this post, I will cover capacity management at infrastructure level.
At the infrastructure level, you look at the big picture. Hence, it is important that you know your architecture well. One way to easily remember what you have is to keep it simple. Yes, you can have different host specifications—CPU speed, amount of RAM, and so on in a cluster. But, that would be hard to remember if you have a large farm with many clusters.
You also need to know what you actually have at the physical layer. If you don’t know how many CPUs or how much RAM the ESXi host has, then it’s impossible to figure out how much capacity is left. I will use storage as an example to illustrate why this is important. Do you know how many IOPS your storage has?
The majority of shared storage is shared with both ESXi and non-ESXi servers mounted. Even if the entire storage is dedicated to ESXi, there is still the physical backup server mounting it, and it might be doing array-based replication or a snapshot.
Some storage support dynamic tiering (high IOPS, low latency storage fronting the low IOPS, and high latency spindles). In this configuration, the underlying physical IOPS varies from minute to minute. This gives a challenge for ESXi and vRealize Operations to determine the actual physical limit of the array, so you need to take extra care to ensure you accurately account for the resources available. A change in the array configuration can impact your capacity planning. Changing the tier preference of a given LUN can probably be done live, so it can be done without you being informed.
Capacity planning at the compute level
Once you know the actual capacity, you are in a position to figure out the usable portion. The next figure shows the relationship. The raw capacity is what you have physically. The Infrastructure as a Service (IaaS) workload is all the workload that is not caused by a VM. For example, the hypervisor itself consumes CPU and RAM. When you bring an ESXi host into the maintenance mode, it will trigger mass vMotion for all the VMs running on it. That vMotion will take up the CPU and RAM of both ESXi hosts and the network between them. So the capacity left for VM, the usable capacity, is Raw Capacity—IaaS workload—Non vSphere workload.