By Bahubali Shetti, Director of Public Cloud Solutions for VMware Cloud Services at VMware

Observability is one aspect of managing Kubernetes clusters. Observability involves gaining insight into multiple data points/sets from the cluster and analyzing this data to resolve issues. Observability covers three main data sets:

  • Metrics — This includes metrics from both the cluster, through cAdvisor, metrics server, and/or Prometheus, along with application data.
  • Logs — Whether it’s cluster logs, or application log information like Syslog, these data sets are important analysis.
  • Tracing — generally obtained with tools like Zipkin, Jaeger, etc. and provide detailed flow information about the application.

Logs are of particular interest since there is an abundance of these, and a significant amount of information can be analyzed from both cluster and application logs. One issue in analyzing logs, is properly aggregating log into a singular location making it easier to cross reference and correlate logs from multiple nodes, pods, containers, and even between multiple clusters.

There are two main solutions:

  • Custom built – A popular solution is to use a singular Prometheus aggregation instance, gathering data from all the individual Prometheus cluster instances
  • Commercial SaaS service – Splunk, Logzio, AWS Elasticsearch, etc.


Custom built solutions are a great option due to the controls they provide, but more than likely it’s easier to use a SaaS based solution.

I explore how AWS Elasticsearch can be used as a SaaS based log aggregation solution using two different yet similar data collectors:

In each of these two blogs I describe how to properly configure AWS Elasticsearch and configure Fluentd and Fluent Bit on VMware Cloud PKS to properly forward logs.


Please feel free to reach out to me @Shetti