So you’ve decided that you want to develop and deploy your applications cheaper and faster. I get it…there are numerous benefits to PaaS that allow your developers to quickly write and change modern applications, deploy them “as code”, all while maintaining the flexibility to operate infrastructure either on-prem or in the cloud. Pivotal Cloud Foundry (PCF) is a popular enterprise distribution of the open-source Cloud Foundry project that has gained rapid adoption.
Often we hear from both development and operations teams, PCF can be a bit of a “black box”. The platform’s inherent resiliency reduces the potential for downtime and performance issues. But issues do happen, and quickly troubleshooting and analyzing their origins is difficult–sometimes impossible. Full-stack monitoring is critical to ensuring the reliability of your PCF apps. To understand why this matters, let’s first take a deeper dive into the architecture of PCF.
PCF Architecture:
This is a basic workflow of Pivotal Cloud Foundry which provides an overview of the structure and components at different layers.
- The Cloud Controller passes requests to stage and run applications to several components on the Diego Brain.
- The Diego Brain components translate staging and running requests into Tasks and Long Running Processes (LRPs), then submit these to the Bulletin Board System (BBS) through an API over HTTP.
- The BBS submits the Tasks and LRPs to the Auctioneer, part of the Diego Brain.
- The Auctioneer distributes these Tasks and LRPs to Cells through an Auction. The Diego Brain communicates with Diego Cells using SSL/TLS protocol.
- Once the Auctioneer assigns a Task or LRP to a Cell, an in-process Executor creates a Garden container in the Cell. The Task or LRP runs in the container.
- The BBS tracks desired LRPs, running LRP instances, and in-flight Tasks. It also periodically analyzes this information and corrects discrepancies to ensure consistency between ActualLRP and DesiredLRP counts.
- The Metron Agent, part of the Cell, forwards application logs, errors, and metrics to the Cloud Foundry Loggregator. For more information, see the Application Logging in Cloud Foundry topic.
Select PCF Components:
- Org and Space:
- PCF uses a role-based access control (RBAC) system to grant Pivotal Application Service users permissions appropriate to their role within an org or a space. An org is a development account that an individual or multiple collaborators can own and use. Collaborators in an org share a resource quota plan, applications, services availability, and custom domains. Every application and service is scoped to a space. An org can contain multiple spaces. A space provides users with access to a shared location for application development, deployment, and maintenance.
- Availability Zone:
- An operator-assigned, functionally independent segment of network infrastructure. In cases of partial infrastructure failure, Pivotal Cloud Foundry (PCF) Elastic Runtime distributes and balances all instances of running applications across remaining AZs.
- GoRouter:
- Handles incoming traffic and routes it to the appropriate component (Cloud Controller or Hosted Application)
- Cloud Controller (Capi):
- Provides REST API endpoints for clients to access System
- Maintains database for all data within PCF
- Auctions applications over Diego cells (resource balancing)
- Diego:
- Hosts applications in containers (Cells)
Key monitoring concepts for Pivotal Cloud Foundry
Because of the interdependencies between PCF Components, understanding and correcting problems—even those that are managed automatically by PCF—requires visibility into the relationships between components, along with their behavior and state.
For example, strategic use of Availability Zones contributes to the fault tolerance and high availability of an Elastic Runtime deployment. Healthy recovery requires historical analysis of the resources in a failed zone, and the ability to isolate related components as the problem evolved.
Persistent data also presents a unique challenge, especially given the ephemeral nature of PCF cells. A source of frequent issues in Pivotal Cloud Foundry environments is Long Running Processes (LRP) in PCF databases. LRP must be monitored closely. LRP metrics report the total number of LRP instances running, claimed or unclaimed on the cell. These metrics can then be used to monitor or troubleshoot database issues.
Enter VMware vRealize Operations
This is a familiar tool that your operations teams are already using for monitoring their data centers. vRealize Operations delivers continuous performance optimization based on intent, efficient capacity management, proactive planning and intelligent remediation. vRealize Operations can easily be extended through management packs to unify operations across SDDC, multiple clouds and from apps to infrastructure.
Blue Medora’s management pack for PCF extends vRealize Operations to deliver full-stack visibility into Pivotal Cloud Foundry, simplifying performance optimization and troubleshooting.
PCF Relationship Mapping Supported by Blue Medora
While Blue Medora supports key relationships within PCF, with the addition of infrastructure integrations or Management Packs, it also provides relational visibility into that underlying infrastructure. This can be data center entities like compute, storage, networking and more. Blue Medora also provides cloud infrastructure integrations for resources on AWS, Azure, GCP, IBM Cloud and Alibaba.
How does Blue Medora’s Pivotal Cloud Foundry management pack gather data?
The Blue Medora Nozzle is a Cloud Foundry component that collects metrics from the Loggregator Firehose and exposes them via a RESTful API. Before the Nozzle can be configured, it must be deployed.
- Nozzle listens to Firehose and caches data
- Presents REST API that Blue Medora products use to access cached data
There are multiple ways to deploy the Nozzle:
1. PCF Tile – PCF only, provides nice UI for Nozzle configuration
2. Cloud Foundry App – Deploy just like a regular app on Cloud Foundry
3. Bosh Deployment – Running Bosh commands to deploy (high chance of customer error)
4. Deploy as Go App – Deploy and run source code in Go environment (requires decent amount of Go programming language knowledge)
Dashboards and Reporting
Blue Medora’s PCF management pack provides six dashboards, 13 reports, access to 500+ metrics, and 144 alerts based on Pivotal recommendations. The dashboards provide administrators and users a view into their PCF environment. The Overview dashboard is the most comprehensive look:
This dashboard shows the health of PCF objects based on metrics such as the number of services, number of applications, CPU usage, memory usage, disk usage, and more.
An Applications dashboard also visualizes application instance dependencies, and provides KPIs from the App, Cells and VMs that relate to it.
The Blue Medora management pack provides 13 different reports:
We also provide 144 different alerts based on Pivotal recommendations and industry standard best practices:
Custom dashboards can be created, as well. Say for example, you would like to see object topologies for Foundations and Availability Zones:
The PCF management pack for vROps from Blue Medora provides detailed insight into enterprise-grade PCF deployments. For more information or a free trial of the product please contact [email protected] or go to https://s3.amazonaws.com/docs.bluemedora.com/vROps/PCF/Default.htm.