Editor’s note: On February 26th, 2019, VMware renamed VMware PKS to VMware Enterprise PKS. To learn more about the change, read here.
The Docker logging driver only offers a limited set of capabilities by mapping all container logs to STDERR and STDOUT. Kubernetes requires users to manage the transport of logs off the cluster. For public cloud providers, this means using hosted logging systems like Google Stackdriver or Amazon CloudWatch. However, operators who want to adopt a multi-cloud strategy and standardize this transport are forced to develop their own logging solution.
Logging is one of the three pillars of observability (metrics and traces being the other two), and logging is essential to many operational functions related to security, auditing, and monitoring. This post describes how VMware PKS uses a set of custom resources to provide a unified logging experience and how you can use that resource to monitor Kubernetes events.
The Cloud Native Computing Foundation (CNCF) has adopted Fluentd and Fluent Bit as incubating projects to help address this common problem. To further simplify and enhance the experience for operators, VMware PKS standardizes on the TCP-based transport described in the RFC 5424 syslog specification by packaging Fluent Bit with an output plugin for syslog along with controllers for interacting with Fluent Bit and the Kubernetes API.
With VMware PKS, you can set a “sink” for a cluster by using either kubectl or the VMware PKS CLI. To set up a sink for a namespace, you use kubectl.
Getting to Know Sink Resources by Using kubectl
When you set up a sink for a namespace by applying the following YAML file with kubectl, it ensures that logs are included and scoped appropriately, in addition to important Kubernetes API events.
apiVersion: apps.pivotal.io/v1beta1 kind: Sink metadata: name: sink-name namespace: my-namespace spec: type: syslog host: example.com port: 52063 enable-tls: true
When the YAML file is applied using kubectl, the Kubernetes API creates a new sink resource and updates the sink and event controllers to securely send logs and events for the appropriate namespace to the specified host and port. Here’s a diagram that illustrates what happens:
Using the VMware PKS CLI
We have simplified the experience of capturing the entire cluster as well. You can easily use the VMware PKS CLI to ensure all logs and events for a cluster are captured for security or audit purposes by creating a sink:
pks create-sink my-cluster syslog-tls://example.com:52063
Understanding Kubernetes API Events for Monitoring
Kubernetes API events can help you monitor cluster activity without direct knowledge of the workloads in the cluster. The following provides a rough guide to detecting and making sense of Kubernetes API events.
- ImagePullBackOff – Image pull backoffs occur when the Kubernetes API cannot reach a registry to retrieve a container or the container does not exist in the registry. If the scheduler tries to access a registry not available on the network (e.g, if Docker Hub is blocked by a firewall), the registry could be experiencing an outage, or a container specified could have been deleted or never uploaded. This issue will be indicated in a sink with the string “Error:ErrImagePull.” Here’s an example:
Jan 25 10:18:58 gke-bf-test-default-pool-aa8027bc-rnf6 k8s.event/default/test-669d4d66b9-zd9h4/: Error: ErrImagePull
- CrashLoopBackOff – Crash loop backoff implies that the container is not functioning as intended. It results in the string “Back-off restarting failed container” appearing in the sink. You should review any recent logs for that workload to better understand the cause of the crash.
Jan 25 09:26:44 vm-bfdfedef-4a6a-4c36-49fc-8b290ad42623 k8s.event/monitoring/cost-analyzer-prometheus-se: Back-off restarting failed container
- ContainerCreated – The successful scheduling of a container will result in the following series of k8s.events ending with the final string “Started container.”
Jan 25 09:14:55 35.239.18.250 k8s.event/rocky-raccoon/logspewer-6b58b6689d/: Created pod: logspewer-6b58b6689d-sr96t Jan 25 09:14:55 35.239.18.250 k8s.event/rocky-raccoon/logspewer-6b58b6689d-sr9: Successfully assigned rocky-raccoon/logspewer-6b58b6689d-sr96t to vm-efe48928-be8e-4db5-772c-426ee7aa52f2 Jan 25 09:14:55 vm-efe48928-be8e-4db5-772c-426ee7aa52f2 k8s.event/rocky-raccoon/logspewer-6b58b6689d-mkd: Killing container with id docker://logspewer:Need to kill Pod Jan 25 09:14:56 vm-efe48928-be8e-4db5-772c-426ee7aa52f2 k8s.event/rocky-raccoon/logspewer-6b58b6689d-sr9: Container image "oratos/logspewer:v0.1" already present on machine Jan 25 09:14:56 vm-efe48928-be8e-4db5-772c-426ee7aa52f2 k8s.event/rocky-raccoon/logspewer-6b58b6689d-sr9: Created container Jan 25 09:14:56 vm-efe48928-be8e-4db5-772c-426ee7aa52f2 k8s.event/rocky-raccoon/logspewer-6b58b6689d-sr9: Started container
- FailedScheduling – This occurs when a container is unable to be scheduled for reasons such as a lack of node resources. This issue will include the string “Insufficient cpu.”
Jan 25 10:51:48 gke-bf-test-default-pool-aa8027bc-rnf6 k8s.event/default/test2-5c87bf4b65-7fdtd/: 0/1 nodes are available: 1 Insufficient cpu.
Getting Started
With VMware PKS version 1.3, all of this is available today. Read the documentation to see all the commands and arguments. All that’s needed is your own syslog server, or you can try VMware Log Intelligence with a 30-day free trial.