By Rostislav M. Georgiev, open source engineer, VMware
Containers and their use in modern day high-performance cloud-native applications have generated a lot of hype in the past few years. Yet they came to the scene relatively late, especially when compared to virtual machines (VMs). Their appearance in the cloud was so impactful and so natural, that it is worth asking, “What took so long? Why didn’t they show up sooner?” To answer those questions, I have prepared a short history of containers and orchestration.
In the beginning, there was chroot
Modern containers can trace their lineage back to 1979, when the chroot system call and command line tool became available as part of Unix Version 7. This feature allows a process to have its root directory changed and, thus, narrow its view on the file system to a specific set of files and directories.
Initially used to run system tests, chroot entered the BSD line of operating systems in 1982 with what was to become 4.2BSD. With the expansion of computer networking and the internet, in the late 1980s and early 1990s, chroot was increasingly used to isolate network services to “jail” potential attackers from the rest of the system. Thus, the term “chroot jail,” or simply “jail,” was coined. However, chroot could only be used to restrict file system usage. Processes with different roots are still able to communicate among themselves, using normal IPC or via network interfaces. chroot cannot be used to restrict processes running under the root user and is also prone to misconfiguration, thus enabling the easy escape of a chrooted process from its “jail.” This is how the term “jailbreak” emerged.
Enter FreeBSD jails and Solaris Zones
To provide for OS-level virtualization, FreeBSD 4.0 introduced a vastly expanded concept of a jail. The FreeBSD jail, as it came to be known, simulates a miniature system on its own. Each jail has its own IP address, root directory and resource limits on CPU, memory, etc. The FreeBSD kernel, also, began distinguishing between same UIDs and GIDs in different jails. For example, each jail has its own root user (with UID of 0), but only the root user of the root jail has unrestricted capabilities. A process running in a non-root jail cannot send signals to another process running in another jail (even if their UIDs matched). Mounting and unmounting of file systems, creating device nodes and modifying the routing table, are also prohibited.
FreeBSD jails came a long way from chroot, but to minimize changes to the system while providing a secure environment, some severe restrictions were introduced. For example, a jail could only have one IPv4 address for communication and could only use UDP or TCP for communication (no raw sockets, no IPv6, etc.). While this significantly increased the security of jails, it also required some additional support in some applications. In recent versions of FreeBSD, some of these restrictions were relaxed without compromising security. However, this was done in response to the emergence of not so restricting container capabilities in other operating systems.
In 2005, the release of Solaris 10 introduced support for Solaris Zones – a boundary management solution between different process groups. Each zone has a name, with associated storage and network interfaces. The kernel restricts communication to processes in the current zone only. However, there is one exception to this rule: root zone processes have no communication restrictions (other than the standard Unix restrictions).
One nice feature of Solaris Zones is the ability to perform system call translation on a per-zone basis. This allows the system to emulate different Solaris versions or even operating systems. Currently, Linux and various Solaris versions can be emulated. Later, the combination of Solaris Zones and workload resource management became known as Solaris Containers, thus the term “container” entered our language.
Linux joins the fun
The Linux community was initially slow to adopt the container concept. The two main Linux kernel components used to implement containers are “namespaces” and “cgroups.” Initial work on Linux namespaces began in the early 2000s when mount namespaces became available as part of Linux 2.4.19. More namespaces were added later and work began to progress rapidly as the demand for in-kernel container infrastructure increased. While Linux namespaces provided limited isolation between different process groups, the question of resource limitation and accounting was delegated to the “cgroups” subsystem. The first version of cgroups was released with Linux 2.6.24. Later, cgroups was rewritten and version 2 appeared in Linux 4.5.
With the new kernel infrastructure in place, it was time for container management user space tools to emerge. The first of these was LXC. Released in the summer of 2008, LXC is essentially a software library and some command line tools that provide a container-centric interface to the kernel’s namespace and cgroups facilities. This approach made LXC strictly a Linux-specific low-level tool. For example, LXC does not provide a solution to package and distribute containerized applications and services. It delegates this entirely to the user. This made LXC difficult to use and created a niche for high-level container management tools.
Later, functionality and constraints similar to that of LXC were included in existing projects, namely, libvirt and systemd. In fact, today systemd-nspawn is the most popular backend for running containers on systemd based Linux systems.
Docker provides the ability to package applications, services and their dependencies for execution in a containerized environment. Linux containers are now easily deployable in all kinds of environments – on public and private clouds, on-premises or bare metal. Docker was designed to make use of multiple container backends. LXC, systemd-nspawn and libvirt are supported for Linux. Recent versions provide the ability to run Linux containers on Mac and Windows by employing virtualization to run a Linux kernel. The Docker Registry and the public Docker Hub service, also made it easy to exchange and use container images for various popular applications and services. Thus, deployment of fully functional containers became as easy as running a couple of commands.
Despite its success, Docker has its shortcomings. Some of these are in the security and reliability fields. For example, Docker uses a centralized daemon to monitor and control containers. Up until recently, the Docker command could only be invoked as root, thus requiring root access in order to manage containers. And since Docker is microservice oriented, this makes it difficult to use as an operating system-level hypervisor.
In response to some of these deficiencies, other container runtimes come to existence. Some of these include CoreOS’s rkt (short for Rocket), runC (which was actually spun off from Docker), and Canonical’s LXD.
rkt eliminates the daemon, features a simpler process model and runs fewer operations as root. runC is a simple command line tool for running Open Container Initiative (OCI) containers, while LXD is more focused on providing OS level hypervisor interface.
With the emergence of Linux containers, it soon became easy and efficient to run cloud-native applications. This implied that many similar containers, usually different instances of a small set of microservices, could run together. A single cloud-native application may require large amounts of microservice instances to run on many different Linux nodes. However, despite all its prowess, Docker remained simply a container runtime environment. It ran on a single node and lacked the ability to automatically scale workloads by starting more containers on other nodes.
Thus emerged the need for a container orchestrator-Software that deploys, schedules and monitors containers on a massive scale in a cluster environment. Some of the most popular orchestrators are Kubernetes, Docker Swarm, Apache Mesos, etc.
Containers have come a long way since their inception. They are the cause of a lot of excitement, and the amount of groundbreaking work gone into them is truly amazing. Every day developers are finding new applications for containers. But one thing is for certain: containers are here to stay and will continue to be a huge source of innovation and exciting new work.
About the author:
Rostislav M. Georgiev started programming as a hobby in his early teens. Since then, he is interested in operating system architectures, IoT, storage, cluster computing and cloud-native design. He joined the VMware Open Source Technology Center in 2017, where he works on container and orchestration related open source projects. Aside from IT, he also likes cats, cars, history and Bulgarian folklore dances.