Removing the Bps bottleneck
While it was published almost five years ago, The Developer is the New Kingmaker, is a phrase that you can’t get away from. It's hard to argue against as we’ve seen the rise of, and, in some cases, utter domination of tools and systems that act as force multipliers to make developers more productive than ever before.
Amazon’s public cloud is the canonical example of this kingmaking. Providing (fairly) simple APIs to do incredibly powerful things it allows a developer or system administrator (if developers are the new kingmakers, then system administrators are the new developer) to request complex infrastructure with just a few API calls and have it ready in a few minutes. We went from taking months for IT to deploy infrastructure for us to completely self service infrastructure in less time than it takes to squeeze a Juicero bag.
Google, Microsoft and dozens of smaller providers followed suit building their own API driven public clouds. Meanwhile Rackspace and NASA teamed up to found the OpenStack project and we saw private cloud plays of varying success (mostly, not successful) from all of the incumbent enterprise vendors. Cloud and API-driven infrastructure became ubiquitous, maybe not in use, but certainly as an aspirational goal.
Infrastructure went from being the largest bottleneck in the delivery pipeline to being one of the quickest parts. Waiting in line, operating system customization and application deployment and configuration became the new leaders of bottlenecking.
Dev, Meet Ops
Thankfully operators were paying attention. As far back as 2008 people recognised the need to start applying agile software development practises to operations. DevOps was born. At about the same time John Allspaw and Paul Hammond casually dropped that Flicker was doing 10+ deploys per pay during Velocity conference.
Agile infrastructure needed tooling to deploy and manage both the infrastructure and the applications running on it. This need gave birth to configuration management tools such as Chef and Puppet. Joining the ranks soon after was orchestration tools such as Ansible and Saltstack and then more recently tools like Hashicorp’s Vagrant and Terraform.
During this time the first of the “cloud native” workloads were showing up. Skipping the costs and inefficiencies of running their own infrastructure they were able to focus on rapidly developing software and innovation using Agile development techniques. Companies like Etsy were born in the cloud and provided a way for the makers of Hand Crafted Scented Poop Candles to connect with people who love the form and function of poop. Other companies like Netflix were able to build greenfield services in the cloud which gave them the freedom to innovate from shipping DVDs by mail to streaming ‘80s nostalgia and buffets of true crime snacks straight to your screen.
Savvy cloud infrastructure providers realized the power of providing developers easier ways to run their applications. These providers started building out tooling to automatically run apps on behalf of a developer without them even needing to know or care about the underlying infrastructure. Early cloud products from Microsoft and Google were application focussed but were perhaps a little too ahead of the curve. Adoption wasn’t great and both companies went on to follow Amazon’s model of an IaaS.
The Arrival of the 12 Factors
Some smaller startups started to really understand the modern development workflow and started to build platforms that made deploying applications as easy as pushing code to github, a centralized git service that had become the go-to place to work on and share software.
Heroku, Pivotal Cloud Foundry (PCF), dotCloud and Engine Yard quickly became the go-to places to deploy and run applications that were designed specifically for the cloud. dotCloud and Engine Yard were smaller niche players, however both Heroku and Pivotal Cloud Foundry became massive services that run hundreds of thousands of applications between them on each of their public Platform as a Service.
Striving to teach the world how to write applications designed to be run on the cloud Heroku produced a manifesto entitled “The Twelve Factor Application” which lists (surprise!) twelve factors that make an application suitable for running on their platform. This manifesto quickly became gospel to thousands of cargo culting developers. Much later the term “cloud native” would be coined to describe these fancy new applications.
Things had reached an equilibrium as we now had abstractions all the way down that satisfied all levels of developers and operators (or "Full Stack Developer" and "DevOps Engineer" as we’re supposed to call them now). The cloud providers continued to innovate bringing improved tooling and automation for their respective platforms to provide better networking, security and storage solutions.
By this time the CTOs and CIOs of large enterprises could no longer ignore the thousands of skunkworks projects that were showing cloud's efficiencies and cost savings right under their noses. Enterprise IT was caught off guard and had to quickly get their enterprise architectures to build “cloud strategies”. One or two of them even managed to execute their strategies!
The Revolution will be Containerized
Netflix and the other Cloud Natives were flourishing and even Enterprise were starting to make progress in their adoption of cloud. Then in early 2013 Solomon Hykes of dotCloud demonstrated an internal project called Docker which he believed would be the future of dotCloud’s PaaS.
Docker was an easy to use wrapper around the existing Linux container tooling that were clumsy and difficult to use. Docker solved this by providing a simple build format called a Dockerfile and a super simple user interface. Thus, we will consider this to be the dawn of the container evolution.
Within a year it was obvious that Docker was a catalyst that was going to drastically improve how applications are built, configured and deployed. A large community quickly formed around Docker and even in its infancy, people were reporting they were running Docker in production.
In 2014 Docker replaced its use of LXC with its own runtime libcontainer written in Go. The technology kept getting better and people started to take notice. Existing Platforms (PaaS) like Cloud Foundry saw the value in the Docker runtime and added support for it while new ones like Dokku and DEIS were built from inside the Docker community.
The Docker community embraced the 12 factor manifesto and could be quite militant about who should run what inside of containers. However this stance has softened as Docker brought in new features and tooling to handle stateful applications, and the community put together tooling to more easily run non-Cloud Native applications and databases inside containers.
The Kubernetes GIFEE
Meanwhile Google had been utilizing Linux Containers for years with their internal cluster manager “Omega” and even have their own DevOps facsimile called Site Reliability Engineering (SRE). A small team of Google engineers (Joe Beda, Brendan Burns and Craig McLuckie) started to work on a Borg like platform using Docker as the execution engine which was announced and Open Sourced as “Kubernetes” in 2014 and quickly joined the Google Infrastructure For Everyone Else (GIFEE) ranks alongside tools like HBase and Zookeeper.
Kubernetes is a distributed system designed to provide a “Google scale” platform for the automated deployment, scaling and operations of containers across a fleet of hosts. As well as providing the lifecycle management of containers it can also interact with your infrastructure to provide networking, storage, and load balancing for your containers. In other words, it completely automates the provisioning and ongoing drift management of applications removing the need for the majority of manual ops-fu.
Google wasn’t the only one to build a container cluster management platform. Docker released Docker Swarm and projects like Hashicorp’s Nomad and Mesos offer a similar experience. At this stage however it seems like Kubernetes has the lion’s share of the market.
Since its initial release Kubernetes has built in a lot of features you’d expect from a mature workload scheduler and now offers high availability across its control plane as well as ReplicaSets and StatefulSets which are resources that build on top of pods to help provide reliable ways to scale pods and to retain their state and data after restarting as well as IngressControllers which provide layer 7 routing and load balancing for applications. To solve configuration and secrets (passwords, keys, etc.) Kubernetes also provides resourced called ConfigMaps and Secrets.
Others from the community have provided Package Managers (Helm from DEIS), Service Mesh[es] (Istio from lyft/google/ibm), Service Catalog (Open Service Broker API from Pivotal) and Operators (CoreO). While these are still in their infancies they are providing ways to (more) easily run most applications inside Kubernetes, whether they’re “cloud native” or not.
With all of these new tools and features coming in Kubernetes is poised to become the leading platform on which applications are run. However Kubernetes does not replace IaaS (yet) nor does it replace PaaS, instead it may well become the default container scheduler and runtime for existing and new PaaS systems.
The Future Is Here, It’s Just Not Evenly Distributed
Within the next two years we’re going to settle on three layers of infrastructure abstraction – IaaS, Kubernetes and PaaS. Practitioners will use the ones that best fit their needs:
-
IaaS will be the land of the Operator (Cluster Ops?), Cloud Providers will continue providing basic infrastructure services such as VMs and Storage and higher level services like Databases. Those that still feel the need to run their own infrastructure will use VMWare or OpenStack or may run Kubernetes directly on bare metal. [Cluster] Ops will continue using their existing tools such as Terraform and Chef to run their own Private Cloud / Kubernetes combo and their existing legacy infrastructure and applications.
-
Kubernetes will be the primary destination for DevOps practitioners, SREs and Advanced Developers. Tooling like StatefulSets and Operators will see us running more backing services in Kubernetes such as databases like Cassandra which will be part of service catalog managed by the Open Broker API. People working at this layer will be working directly with Kubernetes manifests or Helm Charts. The advanced practitioners will use CI/CD practices to do their actual deployments.
-
A PaaS layer will run alongside (or on top of) Kubernetes and provide a simple way for developers to run their applications. A developer may interact with the PaaS directly, but more than likely a CI tool such as Jenkins will do the actual application deployments. To some extent, the developer won't directly interact with the PaaS, let alone the IaaS or even kubernetes.
Despite the youthfulness of Kubernetes and similar platforms there are plenty of companies already working at these layers. Netflix’s home grown platform Spinnaker was originally designed to deploy applications to Amazon, but also supports Kubernetes and Pivotal Cloud Foundry. Yelp also has its own platform Paasta (please stop letting engineers name things!) which is built on top of Mesos.
Many more companies are somewhere along the path and already have a Platform like Pivotal Cloud Foundry that is driving efficiencies in their development practices and are looking to bring similar efficiencies to their operations practices using Kubernetes.
However the majority of businesses are still early on in their Cloud and DevOps journeys and are probably feeling overwhelmed by the massive amount of transformation that needs to happen inside their companies, not just technology and tooling, but also from an organizational and cultural perspective as well.
The technology stack that supports a cloud native approach is finally finishing up a long journey. Where we are now is leaps and bounds from where we were in 2008. We're seeing all sorts of large organizations adopt these technologies and improve how they do software. Technology is no longer the problem. The new, fun, challenge is figuring out how to re-architect your organization and reprogram your people, as it were. But that's a story for another time.