posted

0 Comments

By Bahubali Shetti, Director of Public Cloud Solutions for VMware Cloud Services at VMware

Observability is one aspect of managing Kubernetes clusters. Observability involves gaining insight into multiple data points/sets from the cluster and analyzing this data to resolve issues. Observability covers three main data sets:

  • Metrics — This includes metrics from both the cluster, through cAdvisor, metrics server, and/or Prometheus, along with application data.
  • Logs — Whether it’s cluster logs, or application log information like Syslog, these data sets are important analysis.
  • Tracing — generally obtained with tools like Zipkin, Jaeger, etc. and provide detailed flow information about the application.

Logs are of particular interest since there is an abundance of these, and a significant amount of information can be analyzed from both cluster and application logs. One issue in analyzing logs, is properly aggregating log into a singular location making it easier to cross reference and correlate logs from multiple nodes, pods, containers, and even between multiple clusters.

There are two main solutions:

  • Custom built – A popular solution is to use a singular Prometheus aggregation instance, gathering data from all the individual Prometheus cluster instances
  • Commercial SaaS service – Splunk, Logzio, AWS Elasticsearch, etc.

 

Custom built solutions are a great option due to the controls they provide, but more than likely it’s easier to use a SaaS based solution.

I explore how AWS Elasticsearch can be used as a SaaS based log aggregation solution using two different yet similar data collectors:

In each of these two blogs I describe how to properly configure AWS Elasticsearch and configure Fluentd and Fluent Bit on VMware Cloud PKS to properly forward logs.

 

Please feel free to reach out to me @Shetti