By now you’ve most certainly heard about the runC container vulnerability that was recently revealed (CVE-2019-5736) that affected any container platform that leverages runC including Kubernetes. If you’ve read the CVE documentation or many of the blog posts about it, you might still be confused exactly how the vulnerability works and what specific configurations would protect you or not. Let’s walk through it with some examples to better understand the flaw and different approaches to limiting exposure to this one and others.

The TL;DR on the CVE

As I mentioned there’s been a good number of blog posts on this CVE, including a great one by our own Dan Baskette, so I’ll just summarize it quickly: There was a flaw in the runC binary (that starts and also attaches to your containers) that would allow a container to modify or replace the binary itself with malicious code of its choosing. Since runC is on the host, this malicious code “escapes” the container and has free reign on the host. Yikes. But the good news is there is a specific condition (besides the vulnerable versions of runC) that must exist for this exploit to work—the use of a privileged container. In this context, we mean any container where uid 0 is mapped to the host’s uid 0.

Are you down with Uids?

If you’re unfamiliar with the nitty-gritty of linux-land, all users on a linux operating system are assigned a User Identifier (uid), with the first (and most powerful) one (root) assigned uid 0. Security best practices always advise to use root as little as possible and create other users (and corresponding uids) with as little access as needed.

Seems pretty straightforward, right? Well, it gets more complicated once runC containers (Docker, Kubernetes, etc.) come into the mix. By default, the Docker/runC daemons that run the containers run as root on the host and containers themselves run as root as well. Why isn’t this a huge problem? Through a number of security features like apparmor/selinux, seccomp filters, and the dropping of capabilities and namespaces, the container process being run as root is…well, contained… (mostly). That said, a combination of an exploit like this and insufficient security configurations can allow an attacker to gain full control over a host.

A Demo is Worth a Thousand Words

Let’s see what this looks like in the real world. We’re going to start a pod that references a malicious container, and use that container to access the underlying host remotely.

Here’s our pod definition:

apiVersion: v1

kind: Pod

metadata:

 name: runc-vuln-pod

 labels:

   app: runc-vuln

spec:

 containers:

 - name: runc-vuln-container

   image: vmtyler/runc_vuln

All this does is fire up a single container pod that is waiting for runC to attach to it for some purpose (in this example it’ll be via exec) and replace it with a shell script that will create a reverse shell. Then we just need another instance running netcat to catch the reverse shell and we’ll have full root access to the host.

Using an environment running PKS 1.3.1 (Kubernetes 1.12.4) that is vulnerable to this exploit you can see how it works.

When the container starts, the exploit code is able to modify the runC binary but it still needs runC to restart to activate the new code. By using

`kubectl exec`

 the attacker does exactly that. Once that happens the malicious process connects to the attacker's machine almost instantaneously, giving full control.

Fast Fixing

While we hope to avoid them as much as possible, exploits like this are just a part of software projects both open and closed source. The important thing is that you must be able to patch your environment as quickly as possible to protect yourself from them.

One of the key advantages of Pivotal Container Service (PKS) is the Operations Manager and BOSH tooling that can automate the entire process no matter the size of the environment.  In this case, I deleted the exploited cluster in PKS and deployed a new one with 1.3.1. After that, I upgraded my environment to 1.3.3 which includes the patched version of runC to prevent the exploit. One way to upgrade is to leverage a pipeline for automating upgrades using Concourse. For this demo, I’m going to manually kick-off the upgrade.   

Upgrading with PKS is as simple as importing the new version and selecting apply changes. As you see in this screenshot, I have the ‘upgrade all clusters’ errand checked which will, as it says, upgrade all my clusters after the PKS upgrade is finished.

One quick click on ‘apply changes’, and after a quick coffee run, my cluster is upgraded. This is the power of the PKS automated platform operations architecture. Not only does it self-heal nodes that have failed, but it also makes upgrades an easy push-button operation.

Trust but Verify

So now we have a cluster running the latest version including the runC patch, but let’s try that same exploit again to make sure it’s been fixed. I’m going to use the exact same image, podspec, and exec command as we did before.

As you see, every time we run the exec command, it’s executing the runC binary, but it hasn’t been able to replace it with something malicious. That means there’s never a connection to our netcat session. We’re safe again, at least against this particular exploit.

There’s always going to be some new exploit or flaw uncovered so keeping up with patching is critical to being secure. The main reasons software doesn’t get updated often is it’s a lot of work to do and/or requires significant downtime. PKS makes upgrades both easy and automated including gracefully draining nodes to allow properly configured applications to experience zero downtime during them. So upgrade early and upgrade often.