posted

0 Comments

by Bahubali Shetti Director of Public Cloud Solutions for VMware Cloud Services at VMware

This blog was originally posted here, on August 7, 2018.

(This is a follow up for the blog: Monitoring VMware Kubernetes Engine and Application Metrics with Wavefront)

Kubernetes (K8S), is becoming the defacto management tool to run applications homogeneously across resources (bare metal, public cloud, or private cloud). The single most widely deployed operational component in kubernetes is monitoring.

Almost everyone uses Prometheus in a cluster to aggregate the stats from the cluster. Grafana is then used to graph these stats.

Most articles are written showcasing Prometheus and grafana with a focus on cluster (node, pode, etc) stats. Rarely do any of these discuss application level stats.

While Prometheus has exporters, i.e. mysql (see setup), nginx, there are alternative mechanisms to export application stats.

In this blog, I will explore the use of Telegraf, as a sidecar to extract stats from different application components such as Flask, Django, and mysql.

  • Flask – is a python based web framework used to build websites, and api servers.
  • Django – is a python based web framework, similar to Flask, but is generally used to facilitate the ease and creation of complex, database-driven websites.
  • Mysql – is an open-source relational database management system

Telegraf has a wide range of plugins. More than Prometheus’ set of exporters. Telegraf can sendthese stats to multiple locations (i.e. Wavefront, Prometheus, etc). In this configuration I will showcase Wavefront.

Wavefront can aggregate all stats from all Kubernetes clusters vs Prometheus, which generally displays stats for the specific cluster its deployed in.

In subsequent blogs, I will explore use of telegraf with Prometheus for application stats in a specific cluster.


Application & cluster stats in Wavefront

Before walking through the detailed Telegraf setup with Wavefront, it’s useful to see the end product. Since, Telegraf is collecting stats from flask, django, and mysql containers and sending them to Wavefront, the following graphs show the output in Wavefront. In addition, Wavefront also shows the cluster stats (node/pod/namespace stats).

Configuration and creation of the sidecars and the configurations used is detailed in the next few sections.

Application stats:

api-server stats (flask based)

Stats detailed above are generally added by the developer in python for specific api calls in flask. The two stats on display are for a particular API call (get all signed up users):

  • “timer” per API call —  several metrics such as Timer_stddev, Timer_mean, Timer_upper, etc are displayed per call.
  • Total number of times this call is made in any given period

The application outputs these stats via statsd (port 3125), which is collected by a telegraf sidecar collector in the same pod as the api-server. Again, I will detail this later in the blog.

mysql stats

 mysql stats

mysql stats are obtained via a pull from mysql directly. Approximately 200+ statsd can be pulled.

These stats are output via telegraph configured as a mysql collector.

web server stats (Django based)

django based application stats

Stats detailed above are generally added by the developer in python for specific views in Django. In this case, a forms page is being measured. The two stats on display are for a particular “view” (form page):

  • “timer” calculating the “time” it takes to insert data into a database from the form page — its includes several metrics such as Timer_stddev, Timer_mean, Timer_upper, etc.
  • Total number of times the form is filled out.

The application outputs these stats via statsd (port 3125), which is collected by a telegraf sidecar collector in the same pod as the web-server. Again, I will detail this later in the blog.

Cluster Stats:

In addition to application stats, the entire set of cluster stats is also displayed. This is achieved using heapster, with output to Wavefront.

The following “cluster” stats are generally shown:

  • Namespaces level stats
  • Node level stats
  • Pod level stats
  • Pod container stats

The following set of charts show the standard Kubernetes dashboard in Wavefront.

Cluster stats

Sample Application (called Fitcycle)

In order to walk through the configuration, it’s important to understand the application. I built an application with statsd output (stdout) for flask and Django and deployed it in kubernetes.

The sample app is called fitcycle and is located here.

You can run this in any K8S platform (GKE, EKS, etc). I specifically ran it in VMware Kubernetes Engine (VKE). Once deployed, the following services are available:

  • Main webpage and form page for fitcycle is served by a Django server (supported by web-server PODs)
  • API is served by a Flask based server (api-server PODs) – it has multiple replicas
  • mysql server is served by the mysql POD
  • nginx ingress controller – which is preloaded by VMware Kubernetes Engine (not shown in the diagram below). Nginx
  • ingress controller uses a URL based routing rule to load balance between the api-server and web-server
fitcycle application

The application outputs the following metrics:

  • api-server (flask), and the web-server (Django) output statsd to port 8125 in each pod (internally)
  • mysql collects metrics can be accessed logging in and polling for the right tables.

How do we collect and expose the stats?


Creating a statsd collector using telegraf

Telegraf has a wide variety of inputs/outputs. In deploying telegraf to collect the application stats for fitcycle, I created a statsd container with the following configuration:

(I’ll write another blog about using telegraf with Prometheus)

Detailed repo for building the container is located here.

The container uses the alpine version of telegraf but changes the standard telegraf.conf file with the following:

telegraf.conf

As noted in bold above two plugins are configured for telegraf:

  • input section — for statsd
  • output section — for wavefront (this can be replaced with prometheus)

There are several ENV variables in BOLD above that are important to note:

  • $POD_NAME — used to note the name of the pod if you want to particularly distinguish the pod (I will pass this in when using the container in Kubernetes as a sidecar)
  • $NODE_HOSTNAME — used to note the node where the pod is running (I will get this via a global spec variable from kubernetes when creating the sidecar container)
  • $INTERVAL — to note the collection interval time
  • $WAVEFRONT_PROXY — this is the kubernetes service name, DNS or IP of the wavefront proxy

This telegraf.conf is used in the Dockerfile to create the container

Dockerfile

Now simply run:

And save the container to your favorite repo.

My version of the telgraf based statsd container is available via google registry.

Kubernetes configuration using Telegraf-statsd container

Now that the statsd collector container is built and saved, I added it in a several kubernetes deployment yaml files. (api-server pod and the web-server pod)

I’ll walk through the api-server (flask server) kubernetes deployment file showing how to configure the statsd collector as a side car. The Django and mysql configurations are similar, and details are found in my git repo.

Here is the deployment yaml for the api-server

Note the sections in bold. Key items to note in the configuration are:

  • Use the pre-built statsd collector container:

NODE_HOSTNAME variable uses a value from Kubernetes

spec.nodeName will return the node name this deployment is being deployed in.

  • Collection INTERVAL set to 60s for Wavefront
  • WAVEFRONT_PROXY is set to the service name of Wavefront proxy running in the kubernetes cluster. Installation Notes Here.
  • Enabling port 8125 — which will listen to the output from the api-server

In order to run:

Follow the instructions in the github repo for django and mysql configurations


Sample Application (Fitcycle) with Telegraf sidecars

Now that I have deployed the sidecars, we need to also

  • Deploy the Wavefront proxy (see instructions in the github repo)
  • Deploy the Wavefront heapster deployment

The application with sidecars now looks like follows:

App (Fitcycle) with Telegraf sidecars

Output in Wavefront is in the beginning of the blog.