Editor's Note: This post is regularly updated to reflect new capacity management best practices.
What’s the top job of a platform operator? To make sure Pivotal Cloud Foundry is in an updated, healthy state at all times.
Capacity management is crucial to this end. Without enough capacity, your application developers won’t be able to scale their application instances or push new code. If PCF doesn’t have enough head-room to move containers around during the normal course of operation, potential outages can occur.
So it’s important to look after capacity. But how? What’s the best way to avoid a capacity crunch? In this article, we will focus in on the key memory and disk capacity metrics that Diego emits and what these really mean. (Diego is the container scheduler for Pivotal’s app platform.) We’ll also discuss how you can leverage these metrics to proactively monitor the capacity of your platform.
We have seen a few misconceptions around how reported virtual machine (VM) memory/disk telemetry relate to the capacity of Diego itself. Let’s clear those up!
General Capacity Guidelines
For Pivotal Cloud Foundry (PCF), we make 2 higher-level capacity monitoring recommendations.
Have enough capacity to stay highly available. Pivotal recommends that operators maintain enough platform capacity to withstand the failure of an entire Availability Zone. If you’ve followed our recommendation to deploy across three availability zones (AZs), your ratio of remaining memory, disk, and container capacity does not fall below 35%. By keeping 35% capacity headroom, you’ll have enough room for applications to be redistributed if failure occurs.
Have enough capacity to prevent out-of-memory errors. Pivotal recommends that operators monitor the remaining capacity available in the system. With sufficient amounts of remaining memory, you’ll avoid the out-of-memory errors (e.g. FAILED
InsufficientResources
) when developers cf push
their applications.
First, Some Background on Diego Auctions
Before we dig into the most important capacity metrics, it’s worth a quick refresher on the inner workings of Diego.
Diego balances app processes over the VMs in a PCF installation using the Diego Auction. When new processes need to be allocated to VMs, the Diego Auction determines the placement based on the state of the system. The Diego Auction process repeats whenever new jobs need to be allocated to VMs.
Each auction distributes a current batch of work, tasks, and application instances. These processes can include new jobs, jobs left unallocated in the previous auction, and jobs left orphaned by failed VMs. Diego does not redistribute jobs that are already running on VMs. Only one auction can take place at a time, which prevents placement collisions.
Diego, on a per cell basis, emits metrics representing the maximum possible memory/disk that can be advertised for auction on that cell, and the current amount of memory/disk remaining that can still be allocated.
Critical Capacity Metrics: Memory and Disk Allocations
The two most important capacity metrics in Diego: allocations for memory and disk. Diego uses this allocation information to make the application and tasks placement decisions discussed above. Here’s a recap of these concepts, and the actual metrics:
Total Memory Capacity. Indicates the maximum possible memory that the cell will advertise as available for auction. In aggregate, this conveys the total possible memory that can be allocated to app instances and tasks in this deployment. (origin = rep; metric name = CapacityTotalMemory; value in MiB; emitted periodically per cell)
Remaining Memory Capacity. Indicates the remaining amount of memory available for a given cell to allocate to its containers. The value is the result of the total advertised memory minus the amount of memory already allocated for app instances. (origin = rep; metric name = rep.CapacityRemainingMemory; value in MiB; emitted periodically per cell)
Total Disk Capacity. Indicates the maximum possible disk that the cell will advertise as available for allocation. In aggregate, this conveys the total possible disk that can be allocated to app instances and tasks in this deployment. (origin = rep; metric name = rep.CapacityTotalDisk; value in MiB; emitted periodically per cell)
Remaining Disk Capacity. Indicates the remaining amount of disk available for a given cell to allocate to its containers. The value is the result of the total advertised disk minus the amount of disk already allocated for app instances. (origin = rep; metric name = rep.CapacityRemainingDisk; value in MiB; emitted periodically per cell)
As a first line of operational monitoring, pay attention to the aggregated (across cells) Remaining Memory and Disk values. These indicate when you need to scale Diego Cells.
Recovery and Re-Allocation in Action
To understand why it is less important to monitor per-cell values, it is helpful to understand how Diego manages this concern for you.
Say we have 3 Diego Cells, each with 16GB of total memory capacity. Cells 0 and 1 have 12GB memory allocated, leaving 4GB memory remaining. Cell 2 has 2 GB memory allocated and 14GB remaining memory. Cell 0 crashes. Diego will react by moving that 12GB allocation across the remaining healthy cells (1 and 2) as Cell 0 is recovered.
Although the per-cell allocation and usage numbers have changed during the cell crash resolution, the aggregated capacity remaining has stayed the same. With Diego, an imbalance at the cell level is not concerning, the system automatically adjusts to handle the failure. And if you have more than one app instance running – which of course you should – end users won’t notice this shifting of cells!
For each new app instance that it has to place, the Diego Auction must be able to find at least one individual Diego Cell with enough space to accommodate it. This is why Pivotal recommends that platform operators ultimately think about capacity in terms of “free chunks of remaining memory”.
Think of Memory in Terms of “Free Chunks”
Pivotal recommends a bit of number-crunching to figure out the available memory on the cells. Calculate and monitor “free chunks” of memory remaining on the Diego Cells. So how do you go about this?
You can calculate this recommended “Free Chunks” measure from the Diego metric CapacityRemainingMemory
(origin = rep). The task is easier with PCF Healthwatch. You can monitor and alert on the emitted metric healthwatch
.Diego.AvailableFreeChunks
(origin = healthwatch). This method does the calculation for you.
Use PCF Healthwatch to keep tabs on Pivotal Cloud Foundry.
The strongest operational value of the “Free Chunks” memory measure is to understand your deployment’s average app size. You should then monitor/alert on this metric to ensure that at least some cells have large enough capacity to accept your standard app size pushes.
For example, if a developer pushes a 4GB app, Diego would have trouble placing that app if there is no one Cell with sufficient capacity of 4GB or greater. Therefore you may wish to alert, or trigger a scaling automation, when your number of Free 4GB Chunks falls below a historically safe threshold level.
Beginning with PCF v2.3, Pivotal recommends also paying attention to "Free Chunks" of disk in addition to the "Free Chunks" of memory measure discussed above. If leveraging PCF Healthwatch you can monitor and alert on the emitted metric healthwatch.Diego.AvailableFreeChunksDisk
which does this calculation for you.
Diego’s Pessimistic View of Capacity Allocation (And Why That’s Good)
One common point of confusion on Diego Cell capacity? The difference between the amount of memory that has been allocated, the memory Diego considers “used”, and the slice of memory that is actually being used by an app at a given point in time.
When a developer cf starts
an app with a 1GB memory allocation, that entire 1GB allocation is used up immediately, defined by the memory quota. The amount of memory actually used by the app instance at start-up doesn’t matter.
The same is true of how the Diego Cells allocate memory and disk. Each cell has a certain amount of capacity to allocate. It allocates that capacity pessimistically, assuming that the assigned work will need all of it at some point in its lifecycle. Of course, workloads often don’t use all of their assigned memory/disk. The actual memory/disk usage attributable to the containers may be lower than what Diego considers consumed and therefore not available for auction.
This is why, in the case of Diego Cells, it is misleading to look at the standard BOSH reported memory/disk usage statistics shown in Ops Manager for a given Diego Cell VM. Sure, applications may not be using all of the memory currently allocated to them. But Diego considers this memory potentially needed and therefore used. That’s why examining the Remaining Memory Capacity data emitted by Diego itself is far more meaningful.
Informational Capacity Metrics: Allocated vs. Actual
Understanding the difference between “allocated” and “actual” usage is helpful. This helps you reconcile any discrepancies noted between bosh
vm
--vitals
metrics and what Diego reports.
Sometimes in the course of troubleshooting memory or disk errors, an operator may want to dig in deeper on individual cell capacity. In addition to the prior noted key metrics, which are emitted per cell, Diego also makes a second-order informational layer of metric data available. This can help operators better reconcile capacity on a per-cell basis. Once again, here’s a recap of these concepts, and the actual metrics. (NOTE: We categorize these metrics as “informational” – they are useful telemetry to track, but you shouldn’t alert on them.)
Allocated Memory Capacity. Indicates how much memory a given cell has allocated to its container workloads. This metric is most useful informationally. For a given cell, rep.CapacityAllocatedMemory
plus rep.CapacityRemainingMemory
should equal rep.CapacityTotalMemory
. (origin = rep; metric name = CapacityAllocatedMemory; value in MiB; emitted periodically per cell)
Usage of Memory Capacity. Indicates how much memory the container workloads on a given cell are currently using. This metric is most useful informationally. (origin = rep; metric name = ContainerUsageMemory; value in MiB; emitted periodically per cell)
Allocated Disk Capacity. Indicates how much disk a given cell has allocated to its container workloads. This metric is most useful informationally. For a given cell, rep.CapacityAllocatedDisk
plus rep.CapacityRemainingDisk
should equal rep.CapacityTotalDisk
. (origin = rep; metric name = CapacityAllocatedDisk; value in MiB; emitted periodically per cell)
Usage of Disk Capacity. Indicates how much disk the container workloads on a given cell are currently using. This metric is most useful informationally. (origin = rep; metric name = ContainerUsageDisk; value in MiB; emitted periodically per cell)
Summary
So there you have it. Armed with this background on how Diego reasons about, and reports on, the available Cell capacity, you’re well-equipped to enable proactive monitoring to prevent the “I’m unable to push my app” shoulder-tap!
Observe what historical baselines are a safe level of capacity for your platform use-cases. Set an alert, or better yet a scaling automation trigger, when your platform falls below these thresholds.
Learn from the most experienced platform practitioners! Join us at SpringOne Platform 2018 in Washington, D.C., September 24 to 27. Register now.
Additional Resources:
Thanks to Eric Malm and the Pivotal Diego team for contributions to this article.