By Bahubali Shetti, Director of Public Cloud Solutions for VMware Cloud Services at VMware
As noted in one of my earlier blogs, one of the key issues with managing Kubernetes is observability. Observability is the ability to gain insight into multiple data points/sets from the Kubernetes cluster and analyze this data in resolving issues.
To review, observability for the cluster and application covers three areas:
- Monitoring metrics — Pulling metrics from the cluster, through cAdvisor, metrics server, and/or Prometheus, along with application data which can be aggregated across clusters in Wavefront by VMware.
- Logging data — Whether its cluster logs, or application log information like syslog, these data sets are important for analysis.
- Tracing data — generally obtained with tools like zipkin, jaeger, etc. and provide detailed flow information about the application
In this blog we will explore how to send log data from the Kubernetes cluster using a standard fluentbit daemonset to an instance of AWS Elasticsearch.
There are two possible configurations for AWS Elasticsearch:
- Public open configuration of AWS Elasticsearch
- Secured configuration of AWS Elasticsearch
The basic installation of fluentbit on Kubernetes (VKE) with public configuration is available in my git repo:
In this blog we will discuss how to configure fluentbit with a secure version of AWS Elasticsearch.
The following is a quick overview of the main components used in this blog: Kubernetes logging, Elasticsearch, and Fluentbit.
Log output, whether it’s system level, application based, or cluster based is pushed into the Kubernetes cluster and managed by Kubernetes.
As noted in the Kubernetes documentation:
- Application based logging — ”Everything a containerized application writes to stdout and stderr is handled and redirected somewhere by a container engine. For example, the Docker container engine redirects those two streams to a logging driver, which is configured in Kubernetes to write to a file in json format.”
- System logs — ”There are two types of system components: those that run in a container and those that do not run in a container. For example:
- The Kubernetes scheduler and kube-proxy run in a container.
- The kubelet and container runtime, for example Docker, do not run in containers.
On machines with systemd, the kubelet and container runtime write to journald. If systemd is not present, they write to .log files in the /var/log directory. System components inside containers always write to the /var/log directory, bypassing the default logging mechanism.”
As noted in the previous section, Kubernetes logs to stdout, stderr, or to /var/log directory on the cluster. The ability to pull these logs out of the cluster or aggregate the stdout logs requires the use of a node-level logging agent on each node. (see “logging-agent-pod” in the diagram in the Kubernetes Logging section).
This is generally a dedicated deployment for logging, and is usually deployed in a daemonset (in order to collect from all nodes). This dedicated agent will push these logs to some “logging backend” (output location).
Fluentbit is such a node-level logging agent and is generally deployed in a daemonset. More information on Fluentbit, please visit www.fluentbit.io.
Fluentbit will pull log information from multiple locations on the Kubernetes cluster and push it to one of many outputs.
In this blog we explore AWS Elasticsearch as one of those outputs.
Elasticsearch is a search engine based on Lucene. It aggregates data from multiple locations, parses it, and indexes it, thus enabling the data to be searched. The input can be anything from anywhere. Log aggregation is one of the multiple use cases for Elasticsearch. There is an open source version, and a commercial one from elastic.co.
AWS provides users with the ability to standup an Elasticsearch “cluster” on EC2. AWS thus helps install, manage, scale, and monitor this cluster taking out the intricacies of operating elasticsearch.
Before working through the configuration, the blog assumes the following:
- Application logs are output to stdout from the containers — a great reference is found here in the Kubernetes documentation.
- Privileged access to install fluentbit daemonsets into “kube-system” namespace.
Privileged access may require different configurations on different platforms:
- KOPs – open source kubernetes installer and manager – if you are the one installing then you will have admin access
- GKE – turn off the standard fluentd daemonset preinstalled in GKE cluster. Follow the instructions here.
- VKE – VMware Kubernetes Engine – ensure you are running privileged clusters
This blog will use VKE which is a conformant Kubernetes service.
Application and Kubernetes logs in Elasticsearch
Before we dive into the configuration, its important to understand what the output looks like.
I have configured my standard fitcycle application (as used in other blogs) with stdout.
I’ve also configured fluentbit on VKE, and added a proxy (to access ES) to enable access to my ES cluster.
I have configured AWS Elasticsearch as a public deployment (vs VPC), but with Cognito configured for security.
As you can see above, AWS Elasticsearch provides me with a rich interface to review and analyze the logs for both application and system.
Configuring and deploying fluentbit for AWS Elasticsearch
Fluentbit configuration being used comes from the standard Fluentbit github repository.
I have modified it and pushed the mods to https://github.com/bshetti/fluentbit-setup-vke
Setting up and configuring AWS Elasticsearch
The first step is properly configuring AWS Elasticsearch.
Configure AWS Elasticsearch as public access but with Cognito Authentication
This eliminates which VPC you specify the Elasticsearch cluster on. You can use the VPC configuration. I just choose not to for simplicity.
Configure authentication with Cognito
Once setup, you need to follow the steps from AWS to set up your ES policy, IAM roles, user pools, and users.
Setup user with policy and obtain keys
Once Elasticsearch is setup with Cognito, your cluster is secure. In order for fluentbit to be able to access Elasticsearch, you need to create a user that has Elasticsearch access privileges and obtain the Access Key ID and Secret Access Key for that user.
The policy to assign the user is AmazonESCognitoAccess. (This is setup by Cognito).
Now that you have successfully set up Elasticsearch on AWS, we will deploy fluentbit with an Elasticsearch proxy.
Fluentbit does not support AWS authentication, and even with Cognito turned on, access to the Elasticsearch indices is restricted to use AWS authentication (i.e. key pairs). Keypairs etc are not supported yet (at the time of writing this blog) in fluentbit.
Therefore, we must front end fluentbit with an Elasticsearch proxy that has the AWS authentication built in.
I’ve developed a Kubernetes deployment for the open source aws-es-proxy found at abutaha/aws-es-proxy.
My aws-es-proxy kubernetes deployment files are located at bshetti/fluentbit-setup-vke.
1. First step is to configure the Kubernetes cluster for fluentbit
Fluent Bit must be deployed as a DaemonSet, so on that way it will be available on every node of your Kubernetes cluster. To get started run the following commands to create the namespace, service account and role setup:
$ kubectl create namespace logging
$ kubectl create -f fluent-bit-service-account.yaml
$ kubectl create -f fluent-bit-role.yaml
$ kubectl create -f fluent-bit-role-binding.yaml
2. Next step is to modify and deploy the fluentbit configuration mapping.
Modify /output/es-proxy/fluent-bit-configmap.yaml with the following change:
tls Off <---- must be configured to Off (On is default)
Next create the configmap (will deploy in the logging namespace)
$ kubectl create -f ./output/elasticsearch/fluent-bit-configmap.yaml
3. Run the es-proxy
Change the following parameters in the ./output/es-proxy/es-proxy-deployment.yaml file with your parameters (from setting up the user in AWS with ES access
- name: AWS_SECRET_ACCESS_KEY
- name: ES_ENDPOINT
$ kubectl create -f ./output/es-proxy/es-proxy-deployment.yaml
$ kubectl create -f ./output/es-proxy/es-proxy-service.yaml
This will now have es-proxy service running on port 9200 in the logging namespace
4. Now run the fluentbit daemon set
Simply deploy the following file:
$ kubectl create -f ./output/elasticsearch/fluent-bit-ds-with-proxy.yaml
Visit our website to learn more about VMware Kubernetes Engine.