Multi-tenancy exists in some shape or form in almost every network, but we’ve come to learn that not every operator or administrator has a unified definition of what that means exactly. For an Enterprise network, it can often be viewed as the separation of tenants based on different business units, departments, different security/network policies or compliance requirements. Conversely, for a service provider, multi-tenancy can simply be separation of different customers (tenants).

Multi-tenancy doesn’t just allow for separation of tenants, but it also provides control boundaries in terms of who controls what. For instance, tenant administrators can control/configure the network and security policies for their specific tenants and a service provider administrator can either provide a shared service or provide inter-tenant or WAN connectivity.

In the logical routing world of NSX-T, this provider function can provide connectivity between the tenant logical networks and  physical infrastructure. It can also provide inter-tenant communication or some shared services (like NAT, Load Balancer etc.) to the tenants.

In my previous post, NSX-T: Routing where you need it (Part 1), I discussed how NSX-T provides optimized E-W distributed routing and N-S centralized routing. In addition to that, NSX-T supports a multi-tiered routing model with logical separation between provider router functions and tenant routing functions. The concept of multi-tenancy is built directly into the platform and is reflected in the routing model. The top-tier logical router is referred to as Tier-0 while the bottom-tier logical router is Tier-1. The following diagram shows the multi-tiered routing architecture.
Figure 1: Multi-Tiered Routing Architecture

Before we understand nuts and bolts of this architecture, let’s first address some common questions that our customers ask when evaluating NSX-T:

Is it mandatory to run multi-tiered routing?

The simple answer is no. You can have a single Tier-0 Logical router (LR) which can be connected to physical infrastructure northbound and connected to logical switches southbound. This single tiered approach provides you all the goodness of distributed E-W routing in kernel and centralized N-S routing, as discussed in my previous post.

Why would you want to run a multi-tiered routing topology?

This answer depends on the answer to following questions.

Do you want multiple tenants that need isolation?
Do you want to give provider admin and tenant admin complete control over their services and policies?
Do you want to leverage a CMP (Cloud Management Platform) like openstack to deploy these tenants?
Are you leveraging NSX-T to provide networking/security for Kubernetes/PCF(Pivotal Cloud Foundry)?

If the answer to any one of the above questions is yes, then, you need a multi-tiered routing topology.  Let me explain some of the benefits of multi-tiered routing architecture.

  • This architecture gives both provider administrator and tenant administrators complete control over their services and policies. Provider administrator controls and configures Tier-0 routing and services, and tenant administrators control and configure their tenant specific Tier-1 logical routers.
  • This architecture also eliminates the dependency on physical infrastructure administrator to configure or change anything when a new tenant is configured in the Datacenter.
  • Easy CMP integration. Using openstack, you just have to deploy Tier-1 logical routers and connect them to the pre-configured Tier-0 logical router. Tier-0 LR simply advertises the new tenant routes (learnt from tenant Tier-1 LR) on already established routing adjacency with the physical infrastructure.

How are the two tiers connected?

Tier-0 LR connects to one or more physical routers northbound using Uplink Port and connects to Tier-1 LRs or directly to logical switches southbound via a downlink port. Refer Figure 2 below.

Whereas, Tier-1 LR connects to a Tier-0 LR (this link is known as RouterLink) northbound and it connects to one or more logical switches southbound using Downlink port. Each tier-0-to-tier-1 peer connection is provided a /31 subnet within the 100.64.0.0/10 reserved address space (RFC6598). A user has the flexibility to change this subnet range and use another subnet if desired. This link is created automatically when you create a tier-1 router and connect it to a tier-0 router.

How routing works in multi-tiered architecture?

When a Tier-1 LR is connected to Tier-0 LR, management plane configures a default route on Tier-1 LR with next hop IP address as Routerlink IP of Tier-0 LR (100.64.128.0/31, in the following topology).

To provide reachability to subnets connected to the Tier-1 LR, the Management Plane (MP) configures static routes on the Tier-0 LR for all the LIFs connected to Tier-1 LR with a next hop IP address as Tier-1 LR Routerlink IP (100.64.128.1, in the following topology). 172.16.10.0/24 and 172.16.20.0/24 are seen as NSX Static routes on Tier-0 LR. Figure 2: Route advertisement on Tier-1 and Tier-0 LR

Fundamentals of DR(Distributed Router)/SR(Service Router) discussed in the previous posts remain the same for multi-tiered routing. When a user creates a Tier-1 or a Tier-0 LR, a DR instance is instantiated on all transport nodes (compute hypervisors and edge nodes).

If a centralized service is configured on either Tier-0 or Tier-1 LR, a corresponding SR is instantiated on the Edge node. For instance, when a Tier-0 is connected to physical infrastructure (centralized service), a Tier-0 SR is instantiated on the edge node. Similarly, when a centralized service like NAT is configured on Tier-1, a Tier-1 SR is instantiated on the Edge node.

Moving on, let’s take a look at the inter-tenant communication via Tier-0 LR. It’s important to note that NSX-T provides a fully distributed routing architecture which implies that all the logical routers (Tier-1 or Tier-0) will be distributed and run as kernel modules across all transport nodes.

Inter-Tenant East-West traffic

The following diagram shows a logical view and per transport node view of two Tier-1 LRs serving two different tenants and a Tier-0 LR, configured via NSX-T Manager. Per Transport node view shows that the Tier-1 DRs for both tenants and Tier-0 DR have been instantiated on two hypervisors. In the following topology, I have Web 1 VM (172.16.10.11) hosted on hypervisor HV1 in Tenant 1. I also have App1 (172.16.200.11) hosted on hypervisor HV1 and Web 2 VM (172.16.100.11) hosted on hypervisor HV2, both of these VMs belong to Tenant 2.
I have not configured any centralized services like NAT etc. on either of Tier-1 LR, so both tenant Tier-1 LRs just has the distributed component of LR, i.e. DR.
Figure 3: Multi-Tier distributed E-W Routing with workloads on same hypervisor

 

Multi Tiered Distributed Routing when workloads are on the same hypervisor

Following is the detailed packet walk between workloads in different tenants, but hosted on the same hypervisor (refer Transport Node View in Figure 3).

  1. Web1 VM 172.16.10.11 in Tenant 1 sends a packet to App 1 VM 172.16.200.11 in Tenant 2. The packet is sent to its default gateway interface located on the local HV1 Tier-1 DR (Tenant 1).
  2. Routing lookup happens on the Tenant 1 Tier-1 DR and the packet is routed to Tier-0 DR following the default route to Tier-0 DR Routerlink interface (100.64.128.0/31).

Figure 4: Tenant 1 Tier-1 DR forwarding table

       3. Routing lookup happens on Tier-0 DR which determines that 172.16.200.0/24 subnet is learnt via 100.64.128.3/31 i.e. Tenant 2 Tier-1 DR and packet is routed to Tenant 2 Tier-1 DR.

Figure 5: Tier-0 DR forwarding table 

       4. Routing lookup happens on the Tenant 2 Tier-1 DR which determines that 172.16.200.0/24 subnet is directly connected subnet. L2 lookup is performed in the local MAC table to determine how to reach App 1 VM and packet is sent to App 1 VM.

The reverse traffic from App 1 VM in Tenant 2 follows the similar process. Packet from App 1 VM to destination Web 1 VM, 172.16.10.11 is sent to Tenant-2 Tier-1 DR which follows the default route to Tier-0 DR. Tier-0 routes this packet to Tenant 1 Tier-1 DR and packet is delivered to Web 1 VM.

Multi-Tiered Distributed Routing when workloads are on the different hypervisor

The routing lookups pretty much remain the same as above with an additional step of traffic going on overlay between two hypervisors as the two VMs are hosted on different hypervisors.

Important point to note here is that the traffic from Web1 in Tenant 1 is  routed on HV1 and sent to HV2. The return traffic, however, is routed on HV2. Again, this goes back to the same concept that routing happens closest to the source.

Notice that the packet never left the hypervisor to get routed between Tenants. Let’s take a look at how this traffic flow would look like if these VMs in different tenants are also on different hypervisors.

Figure 6: Multi-Tiered distributed E-W Routing with workloads on different hypervisor

Let’s do traceflow to validate this. Notice that packet from Web 1 VM gets routed on HV1 (ESX-TN1) Tenant 1 DR, goes to Tier-0 which routes the packet to Tenant 2 Tier-1 DR and then the packet is encapsulated in GENEVE to be sent to HV2 (ESX-TN2).

Figure 7: Traceflow between VMs in different tenants

North-South Routing in Multi-Tiered Routing Architecture

Moving on, let’s take a look at N-S connectivity and packet walk from a VM connected to a Tenant Tier-1 router to a device in physical infrastructure. Now, since this packet is destined for a device in physical infrastructure, this packet will need to go through the Edge node.
I talked about centralized routing and how SR hosted on Edge node would handle N-S routing in a previous post, NSX-T: Routing where you need it (Part 2, North-South Routing). Leveraging the same concepts, I will show how N-S routing works for a multi-tier architecture.Figure 8: N-S Packet walk -Multi-Tiered Topology

  1. Web1 VM (172.16.10.11) in Tenant 1 sends a packet to 192.168.100.10. The packet is sent to the Web1 VM default gateway interface located on the Tenant 1 Tier-1 DR i.e.172.16.10.1.
  2. Routing lookup is done on Tenant 1 Tier-1 DR which has a default route to Tier-0. Refer output in figure 4.
  3. Tier-0 DR gets the packet and does the routing lookup and since there is no specific route for 192.168.100.0/24, packet will be routed using default route which points to 169.254.0.2. This is Intra-Tier transit link between the DR and SR. VNI 21389 has been auto assigned to the link between Tier-0 DR and its corresponding Tier-0 SR.

Figure 9: Tier-0 DR forwarding table

      4. MAC lookup is done for 169.254.0.2 and this MAC address is learnt via remote TEP 192.168.140.160 i.e. Edge Node.

Figure 10: ARP and MAC lookup

      5. ESXi host encapsulates the packet and sends it to Edge node.
      6. Edge Node TEP decapsulates the packet and removes the outer header upon receiving the packet.
      7. This packet is sent to Tier-0 SR (as the destination MAC address in the inner header is of Tier-0 SR).
      8. Tier-0 SR does routing lookup and sends the packet to the physical router following the BGP route, which routes it to 192.168.100.10.

Let’s take a look at the packet flow in reverse direction now.Figure 11: N-S Packet walk -Multi-Tiered Topology (Return Packet)

  1. External device 192.168.100.10 sends the return packet to Web1 VM (172.16.10.11) following the BGP route that was learnt via SR on Edge node.
  2. Routing lookup happens on Tier-0 SR which determines that 172.16.10.0/24 is learnt via Tier-0 DR.

Figure 12: Tier-0 SR Routing table

      3. Traffic is sent to Tier-0 DR via Intra Tier transit link between SR and DR.
      4. Tier-0 DR learnt this route via Tenant 1, Tier-1 DR and so, the packet is routed to Tenant 1, Tier-1 DR.

Figure 13: Tier-0 DR forwarding table

      5. Tier-1 DR does a routing lookup which determines that 172.16.10.0/24 is a directly connected subnet and it’s on LIF1 i.e the interface connected to Web-LS. A lookup is performed in LIF1 ARP table to determine the MAC address associated with Web1 VM IP address. This destination MAC is learnt via remote TEP 192.168.140.151 i.e. ESXi host where Web1 VM is hosted.
      6. Edge node encapsulates the packet and sends it to ESXi host.
      7. ESXi host decapsulates the packet and removes the outer header upon receiving the packet. A L2 lookup is performed in the local MAC table associated to LIF1.
      8. Packet is delivered to Web1 VM.

This concludes Multi-Tiered Routing architecture.

Learn More

https://docs.vmware.com/en/VMware-NSX-T/index.html