The value-add of NSX Data Center for Cloud Native Apps

Authored by: Roie Ben-haim (@roie9876) and Niran Even-chen (@niranec)

In our job as System engineers in the networking and security for Cloud Native Applications (CNA) space we get to work with many customers on their implementation of CNA platforms. The fun part of that is that we get to hear the “DevOps” drivers and needs of many enterprise customers. Developers using CNA platforms such as Kubernetes (k8s) and Pivotal Application Service (Formerly known as PCF) don’t always have security and networking first in mind; they are focused on speed and innovation. A developer doesn’t want to wait for the networking team to provision service (e.g. load balancing, switching, etc.), or for the security team to allowlist a new app capability. This is especially true in enterprises where “Change management” is a manual and long process. On the other hand, when it comes to those same enterprises, there are regulations and security requirements that can slow down the innovation process. To strike a balance, networking and security platforms for CNA need to operate in as seamless a manner as possible to maximize speed and efficiency for both IT and developers. NSX Data Center is a platform built just for that. Naturally, one question that comes up a lot is about the advantages of NSX Data Center over other tools being used in the market, mainly the Open Source solutions such as Flannel, Calico, Canal, etc.

This blog post has two parts, we will start with an overview and then dive into technical details.

Overview

On the most basic level, NSX Data Center provides a lot of value, and these capabilities are sometimes taken for granted after so long being successful in the market:

Distributed – NSX Data Center routing, switching, and FW are distributed services running across all hypervisors NSX Data Center is installed on. The distributed manner that NSX Data Center operates in allows for much more “horsepower” (more hosts = more resources) and reduces fault risk (no central component failure is in the data path). This also fits perfectly with all hyper-converged systems that are very popular today and are designed with a scale-out / distributed systems.
Context – NSX Data Center is integrated into the hypervisor layer. Because the hypervisor and its management plane (vCenter in the case of vSphere deployments) have knowledge of all the VMs running on the hypervisors and the containers within them, it allows for much more context for management and operations. For example, NSX Data Center allows the creation of smart grouping to be used in DFW rules which are based on more than just IPs but also on VM names, tags, container names, and even logical networks. This is also true for monitoring where the context of the objects is even more important. This means simplified operations and visibility into the workloads running on NSX.
NSX Data Center provides a lot of its value by being uniquely placed in the hypervisor kernel. That allows for the optimization of east/west communication, protects the FW from being tampered by malicious software running in the Guest OS and enables the distributed manner of which the NSX Data Center services operate in.

While these characteristics of NSX Data Center are well known, the questions arise about what value NSX Data Center provides for Cloud-native platforms such as Kubernetes (k8s) and Pivotal Application Service (PAS). After all, one can achieve basic “Micro-segmentation” with IP tables in the guests, and network overlay with Flannel, routing with Calico (or Flannel+Calico = Canal), etc.

You can choose; your company may be able to build a solution that provides network and security from this jigsaw of open source products, or you can use a unified solution such as NSX Data Center that is integrated in the hypervisor for containers and VMs and supports bare metal and cloud deployment in a single network and security platform supported by a top tier vendor like VMware.

The following are NSX Data Center capabilities make it the leading networking and security platform for CNA:

1. Multi-Domain – when talking about solutions for CNA platforms, one can build a kind of networking and security solution for each one individually. But that answers some of the requirements for each platform separately. Call it silos, or islands. It does not provide a central unified solution. And that is exactly what our enterprise customer security and networking teams demand. Consider this: One team in an organization might choose PAS as the best place to run Java Spring boot applications. And another team providing databases as a service might choose K8s as a platform. Also, containerized apps often need access between platforms and/or they may need access to other virtualized and non-virtualized services. How can one control all of these varied systems network and security requirements with a single solution? Yes, you guessed it, with NSX.
NSX is the glue the connects traditional virtualized apps, baremetal apps and all cloud native apps. It supports consistent policies across the following frameworks:

K8S Do it Yourself (DIY), PKS, Openshift
Pivotal Cloud Foundy /Pivotal Application Serice
Virtual machines
vSphere and KVM
Bare metal (coming soon)
Cloud (coming soon)

With NSX Data Center we provide native support for all these workload types using a single solution with a single security policy framework. You can manage, monitor and troubleshoot networking and security for all of the above workload types. Even more so, when it comes to application development on CNA platforms, NSX Data Center is integrated on the container level natively. The developer can use k8s native network policy language to define their application security policy, as well as CF network policy on Pivotal PAS, as part of their normal workflows without having the need to ask the network team or use the NSX Data Center console to set their app security and networking or load balancing specs. This is huge for our customers! They are enabled to develop at the speed of innovation without compromising security. With that said the control of the system end to end is by the hands of the security team who can manage FW rules beyond the apps so developers don’t violate enterprise policies by mistake.

2. Visibility – NSX Data Center is known for its operational tools. Integration with vROPS for unified monitoring of the stack, vRNI for security planning and network monitoring, vRLI for log insights and more Built-in tools such as traceflow, topology view, etc. All of these tools can be leveraged at the container, VM, cloud VM or bare-metal level with enterprise grade operation tools. Our customers need to be able to troubleshoot nd provide security monitoring that only NSX Data Center can provide across all the types of workloads used by an application.

3. Relatability – Containerized applications platforms such as k8s and PAS are very convoluted on the network front. From an ops teams experience these type of platforms are very hard to monitor or filter from the physical network. This is due to the ephemeral character of containers and the fact that they reside within the worker nodes. For example, if a customer wants to filter/monitor access to external services outside of the CNA platform, they need to create specific network topologies to allow identification of traffic from a set of apps on the physical network. To do this without NSX Data Center one has to create specific network segments and typically multiple foundations or clusters to run the isolated apps and filter traffic in the physical network based on the segment the apps run on, which can be quite complex and inefficient. NSX Data Center assigns a subnet (if routed) or SNAT IP (if NATed) to each logical tenant on the platform to allow identification of traffic for the apps under that tenant. This allows a physical FW outside of the platform to control and filter traffic per logical tenant rather than building physical network architecture.

4. Programmability – NSX Data Center is built for automation. That means that you can achieve everything using automation, and that is also what most of our customers do. We are not only referring to the automation of the network and security from K8S/PAS, but all Day1 and Day2 ops for NSX. Just recently we released a Terraform provider for NSX Data Center, and we have projects automating NSX Data Center with Ansible, Concourse, Python, and more are coming. You can also check the great work done by William Lam for PowerShell automation. This is hugely important and combined with the heterogeneity of the workload we support it takes “Infrastructure as code” to a whole new level.

We’ll dive into each feature function and show the superiority of NSX in the next section. It is more than just a feature discussion, the platform itself is built and positioned in a way that archives all of the above.

That being said, on those areas where there is overlap in features between NSX Data Center and Open Source solutions, NSX Data Center is still technically superior. Now, let’s get down to the nitty-gritty technical details of the above.

Overlay Technology

Overlay technology is the basis of most SDN solutions. It is used to abstract the physical network and decouple the connectivity of the workloads from the physical infrastructure.

Here is an example of the implementation of an overlay network using flannel with k8s. In the figure below we can see two K8s nodes communicating using a Flannel overlay (shown in green). Each node has its own unique network segment, for example, Node1 shown on the left side ‘owns’ a subnet of 10.24.1.0/24 and Node2 (on the right side) ‘owns’ the subnet 10.24.2.0/24. The traffic between the nodes runs over the Flannel overlay through the physical L3 device.

We can see that K8s PODs are deployed on both Nodes. In this case, the L3 switch doesn’t need to know or manage the Flannel network. With this approach adding new K8s node do not require any changes in the L3 device. There are multiple options that Flannel can use for encapsulation, the default encapsulation type is UDP, another option is VXLAN. You can find more info here.

In contrast to that, NSX Data Center uses the next generation of Overlay encapsulation called GENEVE co-authored by VMware, Microsoft, Red Hat and Intel. And there are good reasons we chose this protocol. Here are the details.

GENEVE works with the same general concepts as VXLAN by carrying layer 2 information over Layer 3. Customers can use Flannel as Open source Solution with VXLAN as the overlay. Flannel running the overlay from inside the Guest OS can introduce a performance penalty because packets have to be copied back and forth from the user space to kernel space.

GENEVE runs in a more efficient way compared to Flannel. With GENEVE we use a single kernel module running inside the hypervisor to create the overlay tunnel. Flannel VXLAN runs the tunnel process on each guest OS. If you have 20 K8s node VMs you run 20 VXLAN interfaces that you need to manage and troubleshoots.

GENEVE adds additional information in the Metadata called TLV (Type, Length, Value) allowing it to carry end-to-end telemetry, security, and encryption. More information on GENEVE can be found on the IETF website.

Overlay Performance

By default, Flannel run’s UDP encapsulation as the overlay technology. This UDP overlay introduces high latency as show in this link.

You can change the default UDP overlay protocol to VXLAN as detailed here.

While changing to VXLAN may improve latency it won’t change the fact that the overlay is done on each VM’s user space.

With NSX, the overlay networking is running inside the hypervisor so we can utilize more than a single NIC for overlay traffic while Flannel is limited to a single NIC.

Another example is that with NSX Data Center we can offload the encapsulation process to the physical NICs of the server with technologies such as RSS, TSO and LRO and achieve impressive performance that is close to line rate. Enabling such offload encapsulation processes from inside the guest OS of the VM can be very challenging.

As presented at VMworld 2017 by Samuel Kommu we can see the performance of East-West traffic of the GENEVE protocol:

More info on NSX Performance can found in the blog Myth Busted: Who Says Software Based Networking Performance Does Not Match Physical Networking?

Heterogeneity

As mentioned in the first part of this post, one of the main advantages of NSX Data Center is its workload heterogeneity. With NSX Data Center we have a single networking and security framework for all tiers of an application whether it is running in a container, a VM, on a bare metal server, etc. A great example of that can be seen below. In the figure bellow we see a mix of VMs and containers with a single security policy. In NSX Data Center we created a unified security policy using a construct called NSgroups (similar to the NSX-V security groups) which allows us to apply a single policy across workloads without regards if they are running as containers, in a VM, on a bare metal server or even in the cloud.

In comparison, Flannel is limited to connecting k8s PODs to its overlay network, there is no option to mix containers and VM’s connected to the same tenancy construct (k8s namespace in this example), nor to create a unified policy for them. With Flannel, the only option is to use an external firewall to implement a security policy on the VM level, which means we need two separate policy frameworks and are being forced to match only on IP Addresses instead of Object Metadata from K8s, PC, vCenter, etc.

Another disadvantage of Flannel is the fact that BUM traffic is not supported.

The figure below shows how VMs and Container can be connected together with Flannel. As you can see we need an external L3 device or firewall:

Relatability: Namespace PODs IPs are not unique:

As mentioned in the first part of the post, NSX Data Center has the ability to untangle the network on CNA platforms for the network and security ops teams and is able to relate traffic from logical tenants in the platforms. With the networking technologies used in K8s and PCF/PAS, Calico, etc. the source IP of the traffic cannot be mapped to the tenancy. This is one of the biggest challenges in implementing these solutions in enterprises as it is very hard to filter traffic from specific apps or tenants.

When we create a namespace in K8s or an org in PAS, the containers/pods belonging to the same namespace will be deployed potentially on different worker nodes. As a result, a tenant will have PODs with IP addresses from different Node specific IP Subnets. An example of that can be seen in the figure below, where we have two k8s namespaces: foo and bar’.

The PODs that are deployed in the namespaces ‘foo’ and ‘bar’ are running in Node1 and Node 2 (the assignment of PODs to Nodes is managed by K8s scheduler). The inability to relate the PODs IPs to the namespaces introduces security challenges for traffic between the CNA platform (k8s in this example) to the physical workloads that they may need to access.

Why is this important? In most enterprise networks, we have physical firewalls (as shown in the top of the figure) protecting traffic passing between different environments. Containers running on a CNA platform that need to communicate with physical databases, VMs, or even mainframes need to be explicitly allowlisted in the physical firewall.

Containers are usually ephemeral, meaning they don’t carry state and can be deleted and recreated regularly. As a result, the IPs they use are also ephemeral in nature, and can change at any time in their lifecycle. This creates a challenging task to create and manage firewall rules for containers, especially those that are NATed behind the K8s Management IP address since the firewall doesn’t see the PODs IP.

Allowlisting each new app is a time-consuming effort. Some solutions like OpenShift are trying to solve this challenge by creating specialized PODs where all the traffic is forced through for egress traffic. This egress POD has few drawbacks such as that this egress POD is limited in the amount of traffic throughput it can provide and has a long failover time. You can read more about this here:

NSX Data Center solves this challenge as follows: Each logical tenant (CF Org, Kubernetes Namespace) gets its own logical network topology which includes a Tier-1 Logical Router and dedicated logical switches with unique IP subnets.

With NSX Data Center each tenant gets either an SNAT IP (in NAT Mode), or a directly identifiable subnet (routed mode).

In NAT Mode, the external physical Firewall can now relate and filter traffic coming from a specific tenant’s apps based on that source SNAT IP that is allocated to a specific tenant.

In Routed Mode, the external physical Firewall can relate and filter the traffic to the tenant based on the allocated subnet to that specific tenant.

Security policy

Implementing L4 security policies in K8s environments can be done with K8s Network policies and in PAS with cf network policies. Open Source tools such as Calico and OpenShift rely on CLI tools to define the security policy, e.g. a K8s network policy that creates a single rule. The following rule allows ingress traffic to the PODs with a label of nginx:

The CLI tool to define the policy is not important, the challenge here is to understand how to read and operationalize firewall rules in this way.

The same rule with NSX Data Center firewall policy

As you can see below, creating the same firewall rules with NSX Data Center is very clear and it is simple to understand what is the source and destination.

NSX Data Center supports K8s network policies, but instead of trying to understand the logical rules inside the K8s yaml, we present them as firewall rules in the NSX Data Center GUI. With this approach, the firewall rules are clear and human readable. With NSX Data Center we can also create firewall rules directly from the GUI in the same way traditional firewall management works, and then apply the policy to both VMs and containers. In both of these scenarios the end results are the same, the PODs and VMs protected with the NSX Data Center firewall.

The policy enforces point location

An Interesting point worth mentioning is the place where the firewall policies are implemented. With K8s Network policies, the implementation of the firewall is implemented in iptables which is running inside the guest OS. If a malicious software is running in the Guest OS or the container, it can potentially tamper with iptables or even shut down the firewall completely as illustrated in the diagram below:

With NSX Data Center the enforcement point is in the hypervisor outside the guest OS of the VM and the malicious software reach. Even if an attacker hijacks the guest OS of the VM he or she can’t touch the NSX Data Center firewall because it’s running on the Hypervisor.

NAT or Routed Topology

Flannel only creates an overlay network for East-West traffic. In order to provide north-south connectivity to the outside world, Flannel uses NAT. In K8s, for example, we call this a NodePort, meaning we will NAT or PAT on the management IP address of the Node VM itself.

In the figure below we can see how external users can Ingress to PODs running in Node2. This ingress traffic uses NAT on the node’s management IP address of 10.240.0.4 for Flannel, there is no way to expose the PODs IP address without NAT because Flannel in most if not all cases Flannel is configured to use overlays.

In order to achieve a routed topology in Flannel you will need to combine it with an Open Source technology called Calico.

Project Calico uses pure routed topologies instead of an overlay networking approach, the routing is implemented using the BGP protocol. The following architecture is provided by Calico: Each K8s node acts as a BGP router which advertises its subnets to the next hop (usually a ToR switch). Running inside the node is BIRD which is an Open Source BGP project.

Since Calico uses pure routed topologies it’s very difficult to implement NATed topologies. To achieve that we need to combine it with iptables and create the NAT rules on a per node basis, that means that every time the node is restarted it is needed to run the NAT command as detailed in the Calico guide:

“The commands will need to be run each time the host is restarted.”

There are situations where NATed and routed topologies are required in the same environment. To achieve that we need to utilize an additional open source tool called Canal (now merged with the Calico project).

Canal packages and installs both Calico and Flannel Open Source projects. But since Canal is a combination of Calico and flannel, two independent products, to achieve flexibility of choice between routed and NATed topologies a company would need to manage two separate CNI plugins. This is quite a complex jigsaw to manage and troubleshoot. Adding to that for each platform we deploy a separate deployment which makes things even more complex.

With NSX Data Center both the routed and NATed topologies are supported out of the box allowing you to pick and choose which one fits your deployment needs. With k8s we can even make that decision on a namespace basis allowing you to mix and match between them. In the figure below we can see the namespace ‘bar’ using direct route topology, all PODs in this namespace are fully routed from pod to ToR while the namespace ‘foo’ utilizes a NATed topology. Both namespaces are part of the same k8s cluster.

Another point to consider is that If you have multiple k8s clusters, and you want to create security policies between them using K8s network policies, you need to create the K8s network policy rules on each K8s cluster independently (source and destination) because the K8s network policies are local to the K8s clusters. In many cases you will have services such as a DB running as VMs, this makes things even more problematic from a firewall policy perspective because Calico does not support firewall policies for VMs. We may need to open 3 firewall rules compared to NSX Data Center where you have a single central place to manage all your firewall rules for VMs and PODs.

Load Balancing

Flannel, Calico and Canal don’t have a built-in Load Balancer capability, this requires adding an additional Open Source tool such as NGINX or HA Proxy to provide load balancing capabilities.

NSX Data Center provides a built-in support for Ingress LB (L7) and Svc Type LB (L4). Even more so with the NSX-K8s integration a load balancer will be deployed automatically for each K8s cluster and will be configured using YAML ingress spec files for L7 and service spec from type load balancer. This means that when developers need to load balance their application or allow load balanced access to a service they just need to declare it in the ingress specs YAML file. NSX Data Center automatically program the NSX Data Center infrastructure to support the required load balancing without the need to have an external team involved or manage multiple solutions.

We see an example of a L7 Load balancer deployment (on the left) where traffic from the URI ‘foo’ or ‘bar’ is sent to relevant server pools. In the right figure, we see an example of a NSX Data Center Layer 4 load balancer, traffic is classified based on UDP or TCP ports.

Operation

Running Flannel/Calico in a small scale may achieve certain feature functions such as micro-segmentation, overlay, and routing, but this is not easy to operate and manage and it poses major challenges in large-scale deployments. It is especially complex to troubleshoot, for example in a scenario when the overlay network is not working as expected. Troubleshooting involves many CLI commands on each K8s node separately with no central operational tools for troubleshooting.

Here is an example of a very common troubleshooting scenario where we need to review the node annotations used by Flannel to find the VTEP destination IPs. This is done when we have issues with the overlay network and need to test communication between hosts running the overlay. These are the required commands form the example above:

In contrast, with NSX Data Center we have a central management console from which we can diagnose and troubleshoot via GUI, API or the CLI. We gain deep visibility all the way into the container or VM’s networks. The following operational tools can be utilized for troubleshooting both VM based workloads or containers:

Port Connection

The Port Connection tool helps to view the status of the overlay tunnel between two PODs or VMs. The following screenshot shows an example of the information we can get with Port Connection. As you can see we get a lot of information related to overlay tunnel status, the IP address of the TEP – Tunnel Endpoint, and we can also place the VMs and PODs in the network pretty easily.

There are additional built-in tools in NSX Data Center for operation and troubleshooting which allows us to see port statistics for container and VMS, including Unicast, Multicast, Drop and Total traffic counts:

Port Mirror

With NSX Data Center we can establish a port mirroring session for PODs as well as for VM based workloads. The port mirroring replicates all the network traffic of a port where the PODs or VMs are connected to and sends it to a remote device such as sniffer for further analysis. While this is something that is commonly used in physical switches, creating port mirroring for a virtual machine and even more so for containers is not trivial, with NSX Data Center it is easy.

Traceflow

The traceflow tool expands upon the visualization of connectivity between two logical ports by allowing an administrator to inject user-defined packet types into the network topology to identify the network flow and identify a problem with a DFW rule or connection on a host.

VMware vRealize Log Insight

In the context of NSX Data Center vRLI is an amazing tool for traffic visibility and to analyze if traffic is allowed or denied on the NSX Data Center firewall. Each rule in the DFW can be marked to send logs to vRLI. This has many uses such as:

Logging if packets got allowed or dropped in a specific rule
Map out communication between tiers of an app. One can set the cleanup rule for a platform to send every packet passing through the DFW and then use this log to map out the communication flow between the tiers.
For security purposes to identify an attack if a rule gets hit frequently

These are essential capabilities for any enterprise customer working with firewalls.

Unlike other open source tools like flannel, calico we also have a clean dashboard for NCP in LI

Summary

In summary, to help businesses to achieve their goals, IT organizations must be able to empower developers to deliver the necessary services and applications quickly and securely. There are numerous new application frameworks used to build containerized and microservices architecture applications, including Kubernetes, Pivotal Application Service (PAS), Pivotal Container Service () and Red Hat OpenShift. NSX Data Center integrates across any application framework to bring fast, agile networking and security to new applications developed across any platform. This allows IT to maintain connectivity and security in the environments they manage while empowering developers to use their preferred tools and build networking and security into their applications in an API-driven fashion as a natural part of their workflows.

NSX Data Center provides this in an enterprise-grade, production-ready platform for networking and security.

The following figure created by Steve Tegeler, VMware Director, Technical Product Management depicts this notion perfectly. One cohesive solution compared to a patchwork of open source point solutions.

“End-to-end enterprise-ready solution vs. a point product patchwork”

Here are some resources to explore NSX Data Center with CNA:

Roie and Niran are leading Cloud native networking and security SEs working with VMware’s most strategic customer on their digital transformation journey.

The materials used to create this blog post are based on Yves Fauser NSBU TPM.

We would like to thank him for reviewing and contributing to this blog post.

The value-add of NSX Data Center Compared To Open Source Solutions For Cloud Native Apps