Attending CloudNativeCon/KubeCon this week in Berlin (29th – 30th of March)? Please visit us at our booth #G1 and click for more details about what’s happening at the show!
IT is undergoing a huge transformation.
Organizations are moving away from static infrastructure to full automation on every aspect of IT. This major shift is not happening overnight. It is an evolutionary process, and people decide to evolve their IT at different speeds based on organizational needs.
When I decided to join the VMware Networking & Security Business Unit four years ago, the key deciding factor for me was that I felt that networking is adopting automation far too slowly. Do not get me wrong – we always automated network configurations in some form. I still remember vividly my time as a networking consultant at a major German airport. Back at the beginning of the new millennium, I used a combination of Perl, Telnet and Expect to migrate the configuration of a huge core network from a single-tenant configuration to a multi-tenant MPLS/VPN. Nevertheless, at some point, network operators stopped evolving, and even today largely, we continue to automate by manually setting up new configuration into network devices using the individual boxes CLI syntax.
Then along came VMware NSX. NSX was, and still is, exactly what my definition of a system purpose-built for network and security automation. NSX abstracts away the “to be automated” parts of the network from the static physical infrastructure, and all of this is driven by APIs. NSX lets operators automate where it is good and safe to automate – in the overlay, as well as keep static configuration with the stability sought after in the physical network.
What does this all have to do with Kubernetes?
Let us first have a quick look at Kubernetes’ mission statement:
“Kubernetes is an open-source platform for automating deployment, scaling, and operations of application containers across clusters of hosts, providing container-centric infrastructure”
Container-centric infrastructure needs a network and this network must be dynamic. We cannot keep the good ol’ model of predefining everything and having the containers only “consume” networking. The Container Network must share the life cycle of the applications deployed on Kubernetes – created dynamically, scaled on-demand when the app is scaled, and must be multi-tenant. Anything less will result in incomplete automation and limited agility.
Let me give you an example:
A business unit decides to move to the new Kubernetes deployment that was recently set up by the internal IT Cloud Infrastructure team. Unfortunately, the business unit user will need to wait for a couple of days for his environment to be usable, as the underlying network topology (VLANs, Router IP Interfaces, VRFs, Firewall Interfaces, etc.) has to be pre-configured to map to the business units newly created Kubernetes namespace.
This clearly does not work! The business unit user rightly expects a ‘public cloud experience’. After finalizing contractual details through a web-portal, the Namespace in Kubernetes should be created alongside all needed network and storage constructs. Even more extreme, the business unit should be able to order its own complete Kubernetes deployment – with all network and storage constructs – delivered to them in less than 10 minutes after pressing the “order” button.
Now you might rightly say, “Isn’t this a solved problem? Don’t we have overlay network technologies in Kubernetes that already abstract away the logical network in Kubernetes from the underlying IaaS network?” True! Those have all been invented exactly to solve the issues caused by non-programmable static infrastructure that sits underneath Kubernetes.
However, the current implementations with overlay network technologies have a number of challenges that I would like to walk you through:
- Missing fine-grained traffic control and monitoring: In Kubernetes, operators do not deploy individual containers, they deploy Pods. A Pod is a collection of containers that share the same network interface, and run on the same Kubernetes node. Every Pod has a distinct network interface that gets patched into a Linux network namespace for isolation. It is very complex to troubleshoot and secure Pod-to-Pod connectivity with the current technologies, as they do not offer a central visibility of all Pod network interfaces. Having a central management of Pod network interfaces, with the ability to read counters, do traffic monitoring and enforce spoofguard policies is at the core of the NSX value proposition. NSX also offers a rich set of troubleshooting tools to analyze and solve connectivity issues between Pods.
- Missing fine-grained security policy (firewall rules): In some of the current technologies, Pod-to-Pod traffic is not secured at all. This opens an opportunity for attackers to move laterally from Pod to Pod without being blocked by firewall rules and, even worse, without leaving any traces of this lateral movement in logs. Kubernetes addresses this with the network policy project driven by the Networking Special Interest Group (SIG) in Kubernetes. NSX implements Network Policy alongside with pre-created ‘admin-rules’ to secure Pod-to-Pod traffic in Kubernetes
- Automating the creation of network topology: Many of the current implementations take a simple approach to network topology mapping, which is not to have any topology mapping. IP Subnet allocation for Pods is mostly done per Kubernetes node. Tenancy constructs like namespaces are usually not reflected in anything else than abstract firewall rules. NSX implements a distinct network topology per Kubernetes namespace. NSX maps logical network elements like logical switches and distributed logical router to Kubernetes namespaces in a fully automated manner. Each of those network topologies can then be adapted per namespace, e.g. operators can decide if the subnets of the namespace should be directly routed, or privately addressed and behind NAT
- Integration in enterprise networking: A paradigm of many existing technologies is that the operator needs to decide at the time of install whether the container networks should be privately addressed, and if they are hidden behind a NAT boundary or directly routed within the enterprise network. Existing overlay technologies make it difficult to expose Pods to networks outside of the Kubernetes cluster. Exposing the services involves using NAT/PAT on the Kubernetes nodes themselves, putting the burden on the operator to design how to map e.g. external physical load-balancers or DNS records to TCP/UDP ports on the Kubernetes nodes. Alternatively, or in addition, one can use the new Kubernetes Ingress load-balancers to get traffic into the container networks. In any case, there is NAT involved. With the NSX integration, we intend to allow operators to be able to decide on a per-namespace basis if they want to do direct routing, and even if they want to inject the routes dynamically into the core network using BGP. On the other hand, if operators wanted to save IP address space, they would be able to ‘hide’ the namespace networks behind NAT using private IPs and Kubernetes Ingress Controllers (Load-Balancers) to get external traffic to the Pods.
How does the NSX integration look like? How did we design it?
To start, the integration uses NSX-T since this makes the solution applicable to any compute environment, and not just vSphere. E.g. NSX-T will allow us to support Kubernetes on a variety of compute platforms – such as Photon-Platform (which uses ESXi hosts without vCenter), Baremetal Linux servers, Public Clouds, and KVM based virtualization environments.
To integrate Kubernetes with NSX-T we intend to develop three major Components; 1) the NSX Container Plugin (NCP), 2) the NSX CNI Plugin and 3) NSX Kube-Proxy.
- The NSX Container Plugin (NCP) is a software component that we intend to deliver as a container image, running as an infrastructure Pod in the Kubernetes cluster. It would sit between the Kubernetes API Server, watching for changes on Kubernetes Objects (namespaces, network policies, services etc.), and the NSX-T API. It would create networking constructs based on the object addition and changes reported by the Kubernetes API.
- The NSX CNI Plugin is a small executable intended to be installed on all Kubernetes Nodes. CNI stands for Container Network Interface and is a standard that intends to allow the integration of network solutions like NSX into container orchestration platforms. The Kuberenetes node component called Kubelet will instantiate/call the CNI Plugin to handle the Pod network attachment.
- The NSX Kube-Proxy is a daemon running on the Kubernetes Nodes. Again, we intend to deliver NSX Kube-Proxy as a container image, so that it can be run as a Kubernetes Daemon-Set on the Nodes. NSX Kube-Proxy would replace the native distributed east-west load balancer in Kubernetes called Kube-Proxy, which uses IPTables, with our solution that uses OpenVSwitch (OVS) load-balancing features.
Each of the components deserves a closer look and far more explanation than what I can cover in this initial article. We will follow up with a detailed look at each component in future articles.
Before I wrap this up, there is one more thing you might ask, “How do we solve the two layers of overlay problem?” Well, when running overlay networks between Kubernetes Nodes running as VMs on an IaaS that itself uses an overlay network solution, you get into the situation where you have double encapsulation. E.g. VXLAN in VXLAN.
Would the NSX-T Kubernetes Integration suffer from the same problem?
The answer would be No. When running the Kubernetes nodes as VMs, the tunnel encapsulation would be handled only on the hypervisor vSwitch layer. In fact, the OpenVSwitch (OVS) in the Kubernetes Node VM would not even have a control plane connection to the NSX-T controllers and managers thereby creating an additional layer of isolation and security between the containers and the NSX-T control plane. The NSX CNI Plugin intends to program the OVS in the Kubernetes Node to tag traffic from Pods with a locally significant VLAN id (per vnic). This would allow us to “multiplex” all the traffic coming from the Pods onto one of the Kubernetes Node VMs vnics towards the Hypervisor vSwitch. The VLAN id would allow us to identify individual Pods on the Hypervisor vSwitch using a logical sub interfaces of the VMs vnic. All management and enforcement actions (counters, spoofguard, firewalling, …) on the per-Pod logical port would be done on the Hypervisor vSwitch. The VLAN id imposed by OVS in the Node VM would be stripped by the Hypervisor vSwitch before encapsulating it with the overlay header.
There are many more details to be discussed than what we can not fit into a single article. Stay tuned for more articles on how we integrate Kubernetes with NSX-T!
Meanwhile, if you attend CloudNativeCon / Kubecon this week in Berlin (29th + 30th of March), please visit us at our booth #G1. We would be delighted to chat with you in detail about NSX-T with Kubernetes.
We will also be at Dockercon in Austin, TX, April 17th to the 20th as a gold sponsor. We would love to meet you at our booth #G3 where we will showcase container-ready SDDC solutions that include NSX.
To learn more about NSX Container Networking and Security, watch these VMworld 2016 sessions:
- VMworld 2016: NET9989 – VMware NSX, Network Bridge to Multi-Cloud Future from Guido Appenzeller
- VMworld 2016: NET 7935 – Container Orchestration and Network Virtualization – free VMworld login required
To learn more about VMware’s open source projects and products in the cloud-native space, you can follow the following links: