kubernetes

Enhancing Kubernetes Security with OPA

The security ecosystem for Kubernetes can be confusing. A Sysdig article from July 2019 outlined 33 security tools for Kubernetes. That number has only grown. The tools that help secure your Kubernetes cluster today can be sorted into three broad categories.

Tools like Clair and SonarQube scan code inside your container image for vulnerabilities. They report back their findings to help make your code more secure. Platforms like StackRox, Dynatrace, and Sysdig focus on securing your pipelines to ensure that the code you verified is what gets deployed to your environment. Finally, low-level tools like SELinux, AppArmor, and POSIX are leveraged by Kubernetes and your container runtime to prevent bad actors from getting a foothold inside your cluster.

Even with all of these available products, there’s a hole in this security model that we’ll discuss in this blog post. Even with the above tooling working together, security issues still arise if the wrong sort of workload is deployed into a sensitive area. For example, you wouldn’t want to allow additional load balancers deployed in your Kubernetes cluster because they could route traffic in unintended or unsafe ways. You also wouldn’t want unverified development code deployed into your production environment. To prevent these deployment-related security issues, you can create policies using a Kubernetes component called an Admission Controller.

Kubernetes Admission Controllers can analyze an API request to create objects in a cluster before they’re actually created. There are two kinds of Admission Controllers.

  • Mutating Admission Controllers take an incoming API request and make a prescribed change to it before deploying it in your Kubernetes cluster. These can be useful if you want to make universal changes to parts of a request. Common actions like setting default values for quotas, default Storage Classes, or even setting a Pod to always pull a new copy of the image are handled by Mutating Admission Controllers. A list of Mutating Admission Controllers that are enabled by default can be found in the Kubernetes documentation.
  • Validating Controllers don’t make changes to the API request, but they can reject a request if it’s against a policy used by the Admission Controller.

If either type of Admission Controller rejects an API request, the objects are never actually deployed, and the request reports back that it failed at that point in the workflow. Today we’re investigating a Validating Admission Controller that’s quickly gaining popularity in the Kubernetes community, based on Open Policy Agent (OPA).

Using Open Policy Agent

OPA (pronounced “oh-pa”) is an incubating project of the Cloud Native Computing Foundation (CNCF). From its documentation website, OPA is

“An open source, general-purpose policy engine that unifies policy enforcement across the stack. OPA provides a high-level declarative language that lets you specify policy as code and simple APIs to offload policy decision-making from your software. You can use OPA to enforce policies in microservices, Kubernetes, CI/CD pipelines, API gateways, and more.”

OPA is deployed in Kubernetes as a Validating Admission Controller. There’s a great tutorial on the OPA website to help you get it up and running in your cluster. Although the tutorial uses Minikube as the development platform, any functional Kubernetes cluster can be used. The tutorial also uses self-signed TLS certificates for communication between OPA and Kubernetes. If you want to use your own certificates, just supply them as the files referenced instead of generating the self-signed ones using OpenSSL.

When deployed in Kubernetes, OPA acts as a Validating Admission Controller.  

Validating Requests with Rego

When a request comes into the API server, OPA validates it against a rule set written using Rego, a structured query language that can support JSON. Rego is based on formats like Datalog that have existed in the InfoSec and other communities for decades. The OPA tutorial page walks you through setting up OPA and configuring it to allow specific Ingress domains for specific namespaces. The policy created in the tutorial ensures traffic bound for one domain can’t be hijacked by creating another Ingress for the same domain but pointing it to a different service. Without a tool like OPA that could be a possible attack vector.

Policies written using Rego are how you’ll interact with OPA in your Kubernetes cluster. In the following sections, we’ll examine an OPA policy in depth. The code for this example comes from the OPA website.

Investigating the Intersection of Rego and OPA

OPA policies are loaded into OPA as a ConfigMap.

package kubernetes.admission

import data.kubernetes.namespaces

operations = {"CREATE", "UPDATE"}

deny[msg] {
input.request.kind.kind == "Ingress"
operations[input.request.operation]
host := input.request.object.spec.rules[_].host
not fqdn_matches_any(host, valid_ingress_hosts)
msg := sprintf("invalid ingress host %q", [host])
}

valid_ingress_hosts = {host |
whitelist := namespaces[input.request.namespace].metadata.annotations["ingress-whitelist"]
hosts := split(whitelist, ",")
host := hosts[_]
}

fqdn_matches_any(str, patterns) {
fqdn_matches(str, patterns[_])
}

fqdn_matches(str, pattern) {
pattern_parts := split(pattern, ".")
pattern_parts[0] == "*"
str_parts := split(str, ".")
n_pattern_parts := count(pattern_parts)
n_str_parts := count(str_parts)
suffix := trim(pattern, "*.")
endswith(str, suffix)
}

fqdn_matches(str, pattern) {
    not contains(pattern, "*")
    str == pattern
}
  • The first line, package kubernetes.admission defines a hierarchical name for the policies in the rest of the file file. The default location for policies in OPA is kubernetes.admission.
  • The import parameter, import data.kubernetes.namespaces provides a list of all current namespaces deployed in kubernetes. This data is collected by OPA when the pod is deployed and updated when the policy is activated.
  • operations = {"CREATE", "UPDATE"} defines the actions that will trigger the action. In this case, the policy is run when an API object is created or updated.

After this, OPA policies written with Rego can become a little counterintuitive until you’re accustomed to how they function.

Dissecting an OPA Policy Written with Rego

The most common pattern in Rego is to define a set of conditions to test. If the conditions are all met, the request is denied and the proper reason is presented back through the Kubernetes API server to the user who requested the action. These conditions are defined in the deny function.

deny[msg] {
input.request.kind.kind == "Ingress"
operations[input.request.operation]
host := input.request.object.spec.rules[_].host
not fqdn_matches_any(host, valid_ingress_hosts)
msg := sprintf("invalid ingress host %q", [host])
}

Let’s look at these conditions.

  • input.request.kind.kind == "Ingress" tells OPA to only act on API requests that are creating Ingress objects.  

     
  • operations[input.request.operation] confirms that the request type is in the operations variable. It will return true if the API request’s operation type is either UPDATE or CREATE. When added to the previous test, this policy acts only on Ingress objects when they’re created or updated.

host := input.request.object.spec.rules[_].host defines a variable named host with the data from the API requests .spec.rules.host value. The _ character is a special anonymous variable. Instead of having to explicitly name each variable, the _ character can be used to iterate quickly through a list of values. In this variable definition, the _ variable iterates through all the rules in the API request .spec.rules and tests each value for host against the policy conditions.  

Clarifying the Final Test

 

The final test is where Rego can get a little confusing. The default method, and the one used in this tutorial, is to create a policy that will deny an API request. That means all of the conditions inside the deny policy we’ve been walking through have to evaluate as true. If all of the conditions inside the deny policy are met, then the request is denied by OPA.

The final test in the policy returns True if the domains in the API request are in the ingress-whitelist annotation for its namespace. But we want to deny requests if they aren’t in the ingress-whitelist annotation for the namespace. To accomplish this, the final tests uses the not operator. Even though fqdn_matches_any returns True if the domains should be allowed, the not operator tells the deny policy to look for the inverse of this result.

The code not fqdn_matches_any(host, valid_ingress_hosts) calls the fqdn_matches_any function defined in the policy. Additionally, it passes the valid_ingress_hosts parameter defined in the policy as well.

The value for valid_ingress_hosts is defined as follows:

valid_ingress_hosts = {host |
whitelist := namespaces[input.request.namespace].metadata.annotations["ingress-whitelist"]
hosts := split(whitelist, ",")
host := hosts[_]
}

The curly brackets define valid_ingress_hosts as an array with keys named whitelist and host. The whitelist value is calculated by looking at the namespace annotations for the incoming API request. If the namespace has an annotation named ingress-whitelist, the associated hostname patterns for that annotation are saved as host values within the array.

In the tutorial, namespaces with an ingress-whitelist annotation are created to test the policy against.

apiVersion: v1
kind: Namespace
metadata:
  annotations:
    ingress-whitelist: "*.qa.example.com,*.internal.example.com"
  name: qa

In this namespace, valid_ingress_hosts would be calculated as follows:

{"host": "*.qa.example.com", "host": "*.internal.example.com"}

This array is passed into fqdn_matches_any.

fqdn_matches_any(str, patterns) {
fqdn_matches(str, patterns[_])
}

This function calls two functions named fqdn_matches.

fqdn_matches(str, pattern) {
pattern_parts := split(pattern, ".")
pattern_parts[0] == "*"
str_parts := split(str, ".")
n_pattern_parts := count(pattern_parts)
n_str_parts := count(str_parts)
suffix := trim(pattern, "*.")
endswith(str, suffix)
}

fqdn_matches(str, pattern) {
    not contains(pattern, "*")
    str == pattern
}

These are both functions to take a domain like *.qa.example.com

  • cleanly trim the *. from the front of the domain if present, and return true if the host variable matches one of the domains in valid_ingress_hosts.
  • Return true if there is no *. At the front of the domain and the host string matches one of the domains in valid_ingress_hosts.

Wrapping Up

Let’s summarize this OPA policy deny in plain language:

  • The policy is run when an incoming API request into Kubernetes creates or updates an Ingress object.
  • The host value for the incoming Ingress object is compared against valid ingress hosts maintained as an annotation on the namespace being acted on.
  • If the host value for the incoming request does not match the valid ingress domains for its namespace, the request is denied and “invalid ingress host  is passed back to the user attempting to create the Ingress object. This programmatic logic can quickly test any Kubernetes API request, evaluating it against any data available to the OPA pod through the Kubernetes API or even external data sources. In addition to the functions used above, Rego has built-in libraries to send HTTP or HTTPS requests and evaluate the response data.

OPA policies and Rego are quickly gaining traction in the Kubernetes community because of this robust functionality and the ability to programmatically define policies governing the creation of any API object.