A perennial problem facing web application developers and operators is to ensure trust and security throughout the application stack. This security should extend all the way from the application end user making a request via their browser, through to the application's communication with its microservice peers, and even to its communication with other services such as its database, message queue, or email service.
One of the most common ways to ensure both confidentiality and integrity between clients and servers is mutually authenticated TLS (mTLS). With this method, both the client and the server can authenticate and authorize each other via the identification details they exchange in the TLS handshake. Once established, they have an efficiently encrypted channel over which to exchange information.
Getting those TLS certificates can be a pain, though. Even if they are issued from an internal certificate authority (CA), instead of a global one, the process often takes weeks. You have to request the certificate for your application. IT has to approve the request and then to issue the certificate. Then, IT installs the certificate in your application. While that process might have been acceptable for manual deployments of your application to VMs, it's too slow to keep up with modern, cloud-native applications. After all, Pivotal Cloud Foundry customers are deploying every day! Clearly, a faster way was needed.
Starting in PCF 1.12, the Pivotal Application Service tile issues a unique certificate for each running app instance. This mechanism encodes the identity of the application instance on the platform in several different ways. Further, the certificate is valid for only 24 hours. The platform regenerates it, and replaces it, in the app instance filesystem automatically, shortly before it expires. So if any other service trusts PCF’s certificate authority, it is then set up to authenticate the application instances running on it, and then to authorize them based on the application metadata. This pervasive availability of this strong security fundamental allows both the platform to become more secure by default and to make it easy for your applications to do the same.
What You Get Out of the Box
We'll now take a look at some of the data encoded in one of these application identity certificates. PAS provides the path to the certificate file in the app instance's CF_INSTANCE_CERT
environment variable. Processing that file with OpenSSL shows the certificate contents. Here are the snippets we're interested in:
$ cf ssh
myapp
-c '
openssl
x509 -text -
noout
-in $CF_INSTANCE_CERT'
...
Validity
Not Before: Mar 27 04:04:29 2018 GMT
Not After : Mar 28 04:04:29 2018 GMT
Subject: OU=app:6b814521-5f08-4b1a-8c4e-fbe7c5f3a169, CN=8a886b31-ccf7-480d-54d8-cc28
...
X509v3 Subject Alternative Name:
DNS:8a886b31-ccf7-480d-54d8-cc28, IP Address:10.255.33.6
The certificate metadata contains several different identifiers for this app instance as a distinct unit of work running in PAS:
-
The application ID, the unique identifier that Cloud Foundry uses for the app itself throughout successive pushes, stops, and starts, is present in an organizational unit in the Subject name. The platform uses this identifier as the primary signifier of the application's logical identity. For example, when a developer requests that Cloud Controller bind a service to that application, Cloud Controller sends this identifier to that service's broker.
-
The instance ID, the unique identifier for the individual app instance itself, is the common name in the Subject, and is also present as a DNS Subject Alternative Name (SAN). Inside the app instance container, this identifier is also its hostname. Performance-monitoring agents often use this identifier when reporting information about application instances.
-
The IP address for the app instance container on the container-to-container network introduced in PCF 1.12 is also present as an IP SAN. For the lifetime of that instance, that IP address serves as its identity on that container-to-container network.
As determined by its Validity fields, the certificate is also valid only for the first 24 hours after the app instance starts. About an hour before this certificate expires, the Diego cell issues a new certificate and private key and provides them to the app instance container in place, enabling the app to detect changes either by watching the filesystem locations or by polling the locations infrequently.
Let’s examine how this feature interacts with other familiar PCF workflows.
Use Case: Secure Service-Instance Credential Delivery
Starting in PAS 2.0, service brokers can deliver service-instance credentials to applications through the CredHub component, instead of passing them back to Cloud Controller in the service-binding response. This is an advantage, as it helps your applications comply with regulations or internal audits.
When the service broker stores the binding credentials in CredHub, the broker knows the application ID of the binding application. As a result, it authorizes that application alone to read those credentials later. The CredHub in PAS also trusts the CA that issues the instance certificates, so it knows how to authenticate and authorize apps based on their credentials. When the instance starts up, it requests those service binding credentials from CredHub over HTTPS, using its instance certificate and key as client credentials in the TLS handshake. CredHub then verifies that the application ID in the certificate is authorized to read those service credentials. If this is the case, CredHub proceeds to return the service binding payload so the application instance can then communicate with the sensitive or controlled service.
This communication pattern drastically reduces the extent to which CF system components handle these service credentials. What’s more, it ensures that credentials are always encrypted in transit and at rest. In fact, this pattern also even keeps them out of the environment variables of the application process.
Use Case: Improved Routing Security and Resilience with Envoy
Starting in PAS 2.1, operators can also choose to deploy with greater security and resilience in the routing tier. (This benefit comes at the expense of some additional memory allocation and CPU usage in app containers and the HTTP routers in PAS. But you may still find this trade-off attractive.) This new option ensures both that the routers always connect to the app instance they intend to, and that they encrypt the traffic with TLS all the way to the app container itself.
Pivotal Cloud Foundry 2.1 is now GA! Highlights include support for Windows Server 2016 containers, TLS to the container using @EnvoyProxy & native service discovery. Full rundown: https://t.co/sAo9P8XLmF
— Pivotal (@pivotal) March 28, 2018
With this improved consistency between the routers and the app instances, the routers can also retain potential app instances as target hosts longer. This leads to increased resilience in the routing control plane when network partitions or other faults prevent the regular transmission of route registrations.
Let’s compare and contrast how routing worked before and then after the introduction of the instance credentials.
Before Instance Credentials
Consider how PAS routes a request for the example.com
domain to an app instance backing it, shown below.
-
The app process listens on port 8080 inside its container.
-
The Diego cell forwards traffic from port 61080 on the host to container port 8080.
-
The Diego cell registers its 10.0.0.5 IP and the 61080 host port with the router as an backend for the example.com domain.
-
The router receives an HTTP request for example.com
-
The router connects to the 10.0.0.5:61080 address and forwards the request.
-
The Diego cell forwards the request packets to the app in its container, which handles the request.
This routing process requires the router's registrations to be up to date, though. If the system fails to update them, the router can misroute a request to an app instance that no longer exists, or, even worse, to a completely different app instance. To defend against this possibility, the routers expect the cells to broadcast the route registrations for their apps frequently. The routers then intentionally discard registrations that haven't been updated in the last 120 seconds. Cloud Foundry rightly prioritizes security over availability.
After Instance Credentials
Let's now see how introducing the instance credentials allows us to improve the consistency and resilience of the routing tier. The key addition is that PAS now transparently inserts a sidecar proxy, Envoy, in the request pathway. That Envoy proxy uses the instance credentials to terminate TLS and then forwards the traffic to the app instance inside the container.
-
The app process listens on port 8080 inside its container.
-
Envoy listens on port 8443 inside the container, terminates TLS with the instance credentials that contain the instance ID a7c, and forwards that traffic to port 8080.
-
The Diego cell forwards traffic from port 61443 on the host to container port 8443.
-
The Diego cell registers its 10.0.0.5 IP and the 61443 host port with the router as a TLS backend for the example.com domain, along with the instance ID a7c.
-
The router receives an HTTP request for example.com.
-
The router connects via TLS to the 10.0.0.5:61443 address, verifies the a7c instance ID, and only then forwards the request.
-
The Diego cell forwards the request payload to Envoy, which in turn forwards it to the app itself for processing.
Now if the router connects to the wrong app instance because of a route registration that is out of date, its TLS handshake fails, and it backs out and tries a different instance. As a result, the routers also no longer need to drop out-of-date TLS registrations so aggressively. The routers can maintain app availability during extended failures of the route-registration system.
What about when the instance certificates expire? In that case, because the Diego cell already knows it has issued new credentials, it also uses Envoy's dynamic configuration capabilities to update the credentials there as well. On subsequent connections, Envoy then uses the new set of credentials for TLS termination without skipping a beat.
Use Case: Secure, Direct Communication between Microservices
In the service-instance credential example above, the application participated in the secure interaction as a client. In the route resilience example, it participated as a server. What about the complex world of microservices? In a modern, cloud-native environment such as PCF, though, your microservice application instances are acting as clients and servers of each other all the time. Happily, you can use the same techniques to secure their communication!
To illustrate, we've built an example two-tier web service that you can try out on your own. It consists of a pair of example applications that use their instance-identity credentials to communicate securely, directly over the container-to-container network. When the front-end app handles a request, it calls out to an instance of a separate, back-end app. That back-end app authorizes only a specific list of applications to access it, though, based on the application ID present in the client certificate.
In the example workflow below, we deploy a single back-end app. Note that we have two different front-end copies (green and blue), with the back-end app configured to allow only the green app. Requests to the green front-end app succeed, while those to the blue one fail because the back-end rejects them:
If your PCF installation has enabled the DNS-based service discovery system (now available as an advanced feature in PAS 2.1) you can use this feature to help your front-end application discover the back-end application instances, eliminating the need for any public route to the back end!
If your PCF installation does not yet enable container networking at all, a different example pair of apps illustrates how to apply the same TLS-based authorization rules between apps when they communicate through the HTTP router itself.
What's Next
We're only just getting started with the possibilities that both these per-instance credentials and the inclusion of the feature-rich Envoy proxy open up. Here’s what we’re working on now:
-
The CF Diego and Container Networking teams are actively exploring how best to route traffic between application instances through their Envoy sidecars transparently. This way, developers can move concerns such as TLS configuration, client-side load-balancing, and circuit breakers out of their own application logic and instead into how they configure those applications to run on PCF.
-
Today, the routing improvements in PCF 2.1 apply to HTTP routing for Linux applications. We plan to bring the same improvements to the TCP routing tier, and to applications running on the just-released PAS for Windows.
-
We also plan to include other platform identifiers, such as the space ID and the organization ID, in the instance certificates. These options would give developers and service operators a wider range of platform primitives on which to base their security policies.
As always, we value your feedback and your ideas for other use cases and features that will help solve your real-world security problems. Please get in touch here on this post, on our GitHub repositories, or on the Cloud Foundry Slack workspace!