Cloud Foundry observability Spring Boot

Out-of-the-Box Application Observability with Spring Boot & Pivotal Cloud Foundry

Pivotal Cloud Foundry (PCF) and specifically Pivotal Application Service (PAS) accelerate the path to production for your code. Once you’re in prod, of course, the job isn’t over and in many ways, it’s just starting. A critical day 2 task: monitoring your app to ensure it remains healthy and performant. To monitor a system properly, we need tools that provide meaningful visibility into the metrics that tell us the state of our app.

There are multiple facets to metrics, and they can be roughly categorized into two categories: platform metrics and application metrics.

  • Platform metrics refer to data about the infrastructure on which the application is deployed: VMs, containers, and so on. Common metrics include usage of memory, CPU, and disk. PCF includes PCF Healthwatch which can monitor platform metrics and generate insight and alerting on the overall health of the platform.
  • Application metrics, which is the focus of this post, refers to telemetry data generated as application code executes.

An entire category of products exist to address the problem of processing and analyzing application metrics: Application Performance Management (APM). There are many vendors in this space. Many of them partner with Pivotal to provide seamless integration of their products into PCF. Such products are indispensable when requirements dictate deep and comprehensive application metric generation, insight, and alerting.

What if you can’t use these APM tools? Perhaps you don’t have budget allocated, or you’re working through internal processes to allow these tools to be deployed to production. Or maybe you just want to start simple and become more sophisticated over time as needs demand it.

Whatever the case, the pace of your business demands that you get on with delighting customers with new features and new digital experiences, even if you don’t have the time, budget, or inclination to use a full-featured APM tool. Thankfully, Spring and PCF give you some capabilities “out of the box” that are useful for application metrics collection and visualization.

Let’s examine two categories of application metrics: application runtime metrics and custom application metrics.

  • Application runtime metrics, as the name implies, refer to metrics about the runtime itself. In the case of Java, we’re referring to JVM-related data such as thread counts, heap memory, and garbage collection behavior.
  • Custom application metrics refer to application-generated metrics (either manually or through a library) that generally require some knowledge of application internals. Examples here would be database connection pool information, response error counts, and execution timers.

Exposing Application Metrics in Spring Boot

Application Runtime Metrics with Spring Boot Actuator

The first thing you need to do to monitor your app is externally expose application metrics in some fashion. You can generally take one of two approaches. The first approach and the model that PCF currently adopts is to proactively push metrics to endpoint(s). The second approach is to publish applications endpoint(s) from which metrics can be pulled.

For Spring Boot applications, Spring Boot Actuator offers help publishing application metrics. By including a Spring Boot Actuator “Starter” dependency, your application automatically generate metrics that are visible via HTTP or JMX. We can see an example of this by hitting the (purely diagnostic) /actuator/metrics endpoint that actuator publishes. Here’s an example of the output for an application using embedded Tomcat and JDBC:

"jvm.memory.committed","jvm.buffer.memory.used","jvm.gc.memory.allocated","jdbc.connections.min","tomcat.sessions.created","tomcat.sessions.expired","tomcat.global.request.max","jvm.memory.used","tomcat.global.error","jvm.gc.max.data.size","logback.events","system.cpu.count","jvm.memory.max","jdbc.connections.active","jvm.buffer.total.capacity","jvm.buffer.count","process.files.max","jvm.threads.daemon","process.start.time","tomcat.global.sent","tomcat.sessions.active.max","tomcat.threads.config.max","jvm.gc.live.data.size","process.files.open","process.cpu.usage","tomcat.servlet.request","jvm.gc.pause","process.uptime","tomcat.global.received","http.server.requests","system.load.average.1m","tomcat.cache.hit","tomcat.servlet.error","tomcat.servlet.request.max","jdbc.connections.max","tomcat.cache.access","tomcat.sessions.active.current","tomcat.threads.busy","system.cpu.usage","jvm.threads.live","jvm.classes.loaded","jvm.classes.unloaded","jvm.threads.peak","tomcat.threads.current","tomcat.global.request","jvm.gc.memory.promoted","tomcat.sessions.rejected","tomcat.sessions.alive.max"

Actuator offers a number of push-based integrations with external monitoring systems as shown in its documentation. For pull-based Prometheus integration, you can add the micrometer-registry-prometheus dependency and Spring Boot Actuator will autoconfigure an /actuator/prometheus endpoint that can be scraped by a Prometheus server.

Custom Application Metrics with Micrometer

The example metrics shown above fall into the previously mentioned application runtime metrics category. So what about custom application metrics? Again, Spring Boot Actuator provides an easy mechanism for this through the use of the Micrometer façade API. With this API, you can create metrics (such as counters, gauges, and timers) that provide tailored insight into the inner-workings of the application. For example, a single line of code can increment a counter-based metric:

Metrics.counter("my.app.metric.count").increment(1.0);

Micrometer is popular for two reasons: It’s vendor-neutral, and it creates no dependency between your application code and the actual metric publishing mechanism. So, should you decide to move from InfluxDB (a push model) to Prometheus (pull model), your application code doesn’t have to change. Micrometer is also not Spring-specific. The “Springiness”, if you will, stems from Spring Boot Actuator automatically creating an application-accessible Micrometer registry and including any developer-generated metrics alongside the standard metrics that it publishes. Additionally, several key parts of the Spring ecosystem are now instrumented with Micrometer and those metrics become available in your monitoring system automatically.

Of course, all the components mentioned so far are Spring-friendly capabilities and have nothing to do with PCF.  For push-based metric flows into an external metrics store, these tools may be all you need. The drawback to such an approach is that your applications need to be aware, at some level, of the destination to which they are pushing their metrics. This means potential changes to all your application deployments should that destination change. For pull-based metric flows, Actuator alone will not be enough unless your number of application instances are fixed and your network endpoints never change. In an ideal world, your applications should simply generate metrics, blissfully unaware of what happens to them afterward. That’s where you need additional capabilities, such as those included with Pivotal Cloud Foundry.

Application Metrics Delivery at Scale Using PCF

Scalability is key reason enterprises like yours deploy their most important apps on PCF. With PCF, your apps can operate at whatever scale you require. When performance really counts, say on Black Friday, Cyber Monday, or the day of a big product launch—PCF delivers.

For example, the platform manages the creation and scheduling of containers. It can automatically scale application instances up or down based on custom rules, and it updates load balancers and routing rules as application instance counts change. PCF automatically performs these and a host of other tasks that can otherwise consume a large operations team. This power and flexibility can also pose challenges for traditional metric collection approaches.

Our next step, now that we have a channel to access application metrics, is how to retrieve metrics from all possible application instances. (Remember, the number can change due to scaling.) We also have to figure out where to send those metrics. It’s important that the destination be as resilient and scalable as the deployed applications themselves.

The Loggregator is a PCF subsystem that aggregates and exposes logs and metrics from both platform components and applications.

The Loggregator is comprised of several distributed and scalable components. (Want details on this? Check out the docs.) The important point is that most logs and metrics from PCF components funnel through the Loggregator (a notable exception being platform component logs). The Loggregator, in turn, provides access to a real-time stream of those logs and metrics via the Firehose. Consumers access logs and metrics from the Firehose by attaching Nozzles.

A schematic of how logs and metrics flow from the platform through Loggregator into other IT systems.

At this point, we have two questions to answer: How do we get our application metrics into the Loggregator, and how should we consume them once they are there?

Forwarding Application Metrics

We recently introduced a new way to get application metrics exposed by Spring Boot Actuator into the Loggregator: Metric Registrar. This replaces the older Metrics Forwarder tile.

Metric Registrar: Export Custom App Metrics in a Loggregator-friendly Format

PCF 2.4 and later offers Metric Registrar, a component that can be enabled and configured through the PAS tile in Ops Manager.

Once enabled, Metric Registrar can obtain application metrics using either a metrics endpoint or a structured log registration. It will forward the metrics it receives to the Loggregator. Both of these registration types can be accomplished through the use of the Metrics Registrar CLI plugin.

How does this work exactly? Glad you asked!

Metrics Endpoint Registration

A metrics endpoint registration tells Metric Registrar that an application publishes an HTTP(S) endpoint in Prometheus exposition format that can be scraped for application metrics. The scraping interval is defined in the PAS Tile within Ops Manager.

A schematic of how metrics flow into Loggregator via an Endpoint Registration

Conveniently, as mentioned above, applications using Spring Boot Actuator can expose an endpoint in Prometheus exposition format simply by including an additional dependency.

Structured Log Registration

A metrics endpoint registration requires that an application publish a public endpoint for metrics. If this is not possible or desirable, a structured log registration can be used.

A schematic of how metrics flow into Loggregator via a Structured Log Registration

A structured log registration tells Metric Registrar that an application will output metrics in its application log in either JSON or DogStatsD format. Metric Registrar will watch Loggregator for log entries coming from the application, convert the appropriate ones into metrics and forward those back to Loggregator.

Consuming Application Metrics: Beyond Real-Time Streaming

The setup we just discussed is applicable when you need metrics as a real-time stream. Once you’ve got application metrics flowing into the Loggregator, all you have to do is attach a custom nozzle to the Firehose.

It’s worth noting that you will not see application metrics when using the “cf logs” command. This is by design, as the command is only intended to show logs. Using the “cf tail” or “cf nozzle” commands will show metrics coming from Loggregator.

What if our requirements cannot be met by real-time stream? We have a few options to consider.

PCF Metrics: Events, Logs, and Metrics Organized on a Timeline

PCF Metrics is a service tile that can also be installed via Ops Manager. It connects to the Loggregator Firehose and renders metrics for a rolling 14 day windows. These include container metrics (CPU, disk, memory), application events (start, stop, update, crash, etc.), and application logs. The data can be visualized through a web-based user interface. The module includes a trace explorer that can show the execution flow of individual requests if the Spring application is properly configured. There is also a monitoring component that can generate alerts when user-defined thresholds on platform or custom metrics are exceeded.

PCF Metrics shows events, metrics, and logs in a useful timeline view.

Since usage of PCF Metrics is included in a PCF subscription, it’s a logical first step to try it. It is worth noting that PCF Metrics is not a true dimensional metric store and may not cover all use cases — especially when more advanced Micrometer metrics are involved. It’s thus important to thoroughly evaluate whether it can meet your specific requirements for application metrics visualization and alerting.

Log Cache: An In-Memory Cache of Telemetry Data

PCF Log Cache is a component that maintains an in-memory cache of logs and metrics coming from the Loggregator. The stored data can be accessed via JSON-based REST API or Prometheus Query Language (PromQL) compatible endpoints. A CLI plugin is also available to allow easy access to the cached data.

Log Cache is, as its name implies, only a cache. Depending on how it is configured, it will store around 15 minutes worth of data at any given time. So metrics will need to be periodically pulled into another datastore if retention beyond the default caching period is desired. For example, one might create an adapter that dynamically converts Log Cache’s JSON output to Prometheus exposition format to be scraped by an external Prometheus server.

Syslog Drains

The CF Drain CLI creates syslog drains and bind them to applications. By default, this will automatically send any logs generated by the application to a specified syslog collector URL.

A schematic of how metrics flow to a syslog collector via a syslog drain.

By following the instructions here, the syslog drain can be configured to send application metrics to the syslog collector.

Syslog Nozzle

The OSS firehose-to-syslog project maintains a nozzle that attaches to the Firehose and forwards all events to a syslog server. The nozzle is a separate application process that must run somewhere. Fortunately it can be deployed as a standard application on PCF if desired.

A schematic of how metrics flow to a syslog collector via a nozzle.

Use These Observability Tools, and Others, As You Grow

Keeping your applications stable and scalable are core promises of Pivotal Cloud Foundry. You can only realize these benefits with a powerful set of application observability capabilities. If you’re just getting started on your cloud native journey, many of the components discussed here will be invaluable for you as you learn and grow.

Over time, your journey to become great at software will take many twists and turns. Once you reach a certain maturity, a full-featured APM solution will become essential to your success. (In fact, these tools hook into several of the components in this post.)

In the meantime, you can bootstrap application observability using the tool and workflows we’ve discussed here.