Home > Blogs > The Network Virtualization Blog > Category Archives: VMware NSX

Category Archives: VMware NSX

VCDX-NV Interview: Greg Stemberger

Greg Stemberger is an IT professional who started working in networking in 2000. Working in network operations at Sprint, he managed some of the Greg-Stemberger-Force3largest enterprise networks in the world as the Managed Services Operations Engineer focused primarily on routing and switching. He managed more than 20,000 Cisco devices in his initial role at Sprint. Greg has three CCIEs: in route/switch, security, and service provider. He’s also a member of the first group of VCDX-NV certified professionals.

What excites you about network virtualization?

Virtualization is actually nothing new to me, to be honest, because I’ve been dealing with multi-tenancy, which really in my mind, started on the WAN side where VPNs were really one of the first early versions of introducing multi-tenancy and segmentation of the network, and leveraging virtualization-type technology on hardware. It’s just fascinating to see how much that’s evolved and taken off in the compute world. Now, we’re coming back together full circle with SDN. The network is now playing catch-up with how much agility and flexibility virtualization has provided to the compute world. I believe I have been doing virtual networking for a number of years now, but obviously it’s morphed into something much more powerful today than it was five, six year ago when I was just doing virtual routing and things along those lines.

As you’ve wentthrough network virtualization training, did anything surprised you?

I’m amazed at how powerful the network functions have become down to compute level. I didn’t fully grasp how much flexibility is possible down to the network level in virtualization. I just assumed that you needed a piece of hardware to do that, a dedicate piece of hardware, but software has come so far that now we could potentially deliver a lot of the same capabilities at very scalable rates down on an x86 fixed platform.

How do you think getting certified in network virtualization will help traditional networking professionals in their career?

I think it’s a natural evolution that more of network intelligence is going to continue to extend into the software realm, because of the power of computing today, and the power that software programming brings. I don’t think anybody can challenge the fact that network virtualization brings so much agility and power to networking that we never had before. Obviously, looking at NSX and understanding what’s possible in terms of software-defined networking is just a great salvation towards understanding the networks of the future.

What would you say to someone who said “I don’t need to learn about network virtualization?”

I would argue that they maybe don’t understand the power that SDN brings to a network environment. I think you start to understand the value of the proposition around SDN when you realize you can streamline the operational efficiencies of how you manage an IT infrastructure from the network down to the compute into one system, and you see how fast services can be either enabled from scratch based on a business need or changed based on a business requirement much more quickly and efficiently.

Does a networking professionals existing skill set diminish in value with network virtualization?

That’s a great question. I get into these conversations a lot with peers of mine. To be honest, I don’t see any risk to the skills that we have today. The network in many ways will still fundamentally rely on some sort of underlying protocol control plane that needs to be understood, especially in regards to how traffic moves between end points or between nodes in the network. Having that strong engineering skill set to understand how the control plane and how the data plane is forwarding packets, which lends itself well to any strong network engineer, is going to be very important moving forward. It’s just that there’s an evolution in our skill set in terms of how we manage and design and implement these networks that’s going to evolve and I think it’s evolving for the better.

Anything else that you think someone should know?

I guess one interesting thing is that I actually haven’t spent much time on vSphere and or VMware products prior to this. This has actually motivated me to go back and learn vSphere and some of the core virtualization products that VMware brings to the table, because I need to understand those better to really fully grasp what network virtualization and NSX brings to the table. It’s actually a win-win.

 

Deploying VMware NSX with Horizon

As part of the recent launch of Horizon 6, Tony Paikeday, senior product line manager, End-User Computing, VMware, takes a look at the value proposition of deploying the VMware NSX network virtualization platform together with Horzon.

VMware NSX

Deploying VMware NSX with Horizon

VMware NSX, deployed with Horizon, offers a better alternative to securing east-west traffic between VMs, turning data center security from a perimeter-centric view to one that gives each individual desktop VM its own virtual network container – creating if you will, a network of “one.” This approach, also known as micro-segmentation, has been an ideal for network teams, but traditionally unachievable due to the cost, and the operational complexity involved. With the number of user VM’s introduced by desktop virtualization, and the sprawl of firewall rules needing to be manually added, deleted or modified every time a new VM is introduced, this has been untenable in the past. With VMware NSX, we have a completely new model for networking and security, delivering virtualization of the network, much as we did for server virtualization – reproducing it in software, with a logical library of networking elements and services including switches, routers, firewalls, load-balancers and more that can be deployed over any existing network.

Read Tony’s full blog post here at http://blogs.vmware.com/euc/2015/03/securing-virtual-desktops-east-west-threats-data-center.html

Roger

Introducing New VCE VxBlock Systems with Integrated VMware NSX

Last month, we outlined VMware’s vision for helping customers achieve one cloud for any application and any device. We believe the prevailing model for cloud adoption will be the hybrid cloud, and the best architecture for achieving the hybrid cloud is through a software-defined data center architecture. The fastest path to building reliable infrastructure for the hybrid cloud is through the use of converged infrastructure systems, and no company has been more successful at delivering on the promise of converged infrastructure than our partner VCE.

Now, the ability to procure and deploy the VMware NSX network virtualization platform with VCE converged infrastructure is about to get whole lot easier.

Today, VCE launched VCE VxBlock Systems, a new family of converged infrastructure systems that will factory-integrate VMware NSX for software-defined data center deployments. The new VxBlock Systems will include VCE pre-integration, pre-testing and pre-validation of VMware NSX, with seamless component-level updates, ongoing lifecycle assurance, and unified single-call support from VCE.

As I wrote previously, VMware NSX already runs great on existing Vblock Systems. Customers today are deploying VMware NSX with their existing Vblocks, and customers will be able to extend VMware NSX environments across their entire VCE converged infrastructure environment as they move to the new VxBlock Systems.

This solution will be a powerful building block for the software-defined data center, delivering unparalleled IT agility through automation, and unparalleled security through micro-segmentation.

Agility through IT Automation

  • Reduce time to provision multi-tier networking and security services from weeks to minutes.
  • Achieve faster development, testing and deployment of new applications by aligning network and security provisioning with compute and storage provisioning.
  • Streamline IT operations through programmatic creation, provisioning, snapshotting, deleting and restoration of complex software-based networks.
  • Build advanced workflows through cloud management platforms to automate provisioning of networking and security, including switching, routing, firewalling, and load balancing without manually reconfiguring physical network devices.

Unparalleled Security

  • Use micro-segmentation and isolation capabilities of VMware NSX to build security directly into the data center infrastructure.
  • Insert advanced partner services from leading security vendors to improve threat protection, reduce risk and help address their compliance requirements.
  • Achieve better security inside the data center through fine-grained policies that enable firewall controls and advanced security down to the level of the virtual NIC.
  • Create dynamic security policies that are automatically applied when a virtual machine spins up, are moved when a virtual machine is migrated and are removed when a virtual machine is de-provisioned

VMware NSX is the ideal platform for virtualizing the network running on top of VCE converged infrastructure.

Hatem

VMware NSX Webcast – Creating Agile Networks

You may have seen Joey Logano speed to his first Daytona 500 win this week. Keeping your network in racing shape takes a similar level of NSX: Wanna Go Fastpatience, stamina, and quick reflexes. Using VMware NSX network virtualization means that you can unlock the full potential of a Software-Defined Data Center, to create and run entire networks on top of existing network hardware, resulting in faster deployment of workloads, as well as greater agility in the face of increasingly dynamic data centers. Watch this overview to learn how VMware NSX reduces the time to provision multi-tier networking and security services from weeks to seconds to win your race. This one-hour overview of VMware NSX outlines how you can bring virtualization to your existing network, transforming both its operations and economics. You’ll learn how several of the largest service providers, global financial, and enterprise data centers in the world are using NSX to reduce costs and provisioning times to improve agility and establish a new model of network security.

Click here to watch this webcast and find out:

  • What the NSX architecture looks like
  • How switching, routing, firewalling, load-balancing and other services are managed with NSX
  • How overlay networks and logical networks all come together with your physical infrastructure

Roger

VMware NSX Loves Hardware

One of the core value propositions of VMware NSX is ability to take advantage of any underlying hardware infrastructure and deliver a fully decoupled virtualized network in software. VMware NSX loves a Modern Infrastructuregood hardware fabric,.

But that’s not the only hardware VMware NSX loves.

The votes have been cast and counted, and we are pleased to announce that VMware NSX was selected as the winner in the “Best Software Defined Infrastructure” category in the 2015 Modern Infrastructure Impact Awards. The awards were judged by the Modern Infrastructure e-zine editorial staff, in conjunction with users, readers, and industry experts.

The Modern Infrastructure Impact Awards recognize the top products, technologies and services in the essential areas of technology that Modern Infrastructure covers. The award-winning tools are those helping to run enterprise businesses with efficiency and insight — whether they’re used inside the data center or out.

VMware NSX delivers secure network services to applications running in the data center, resulting in instant and programmatic provisioning, fast and highly available infrastructure, and increased security and micro segmentation capabilities.

Read about the award here and to learn more about the business value of VMware NSX visit  vmware.com/products/nsx.

Roger

3 Ways To Get Started With VMware NSX

Over the past 12 months, VMware NSX momentum has continued to grow, as we’ve added VMware NSXnew platform capabilities, expanded our partner ecosystem, and of course, had more than 250+ customers purchase NSX for deployment. And as interest in VMware NSX has grown with both customers and IT professionals looking to evolve their careers by adding certification in network virtualization, one of the most common questions that we get is “How can I get started with NSX?.”

We understand that there is a strong demand for individuals and organizations to get their hands on the NSX technology. Many of you are working towards your initial VCP-NV certification. Others of you are exploring NSX as a way to improve your organization’s agility and security while reducing overall costs.

Here are three ways individuals and companies can get started with NSX. Continue reading

Schuberg Philis Deploys VMware NSX

Summaryschuberg_philis_logo_pms298uwarmgray9u

Application Roll Out Reduced from Weeks to Minutes
• VMware NSX Enables Better Agility, Flexibility and Security

Recently I had the opportunity to speak with the team at Schuberg Philis about their successful, production deployment of VMware NSX. As background, Schuberg Philis is an innovative business technology company and an important player in the field of mission critical outsourcing services. The company serves customers across financial services, retail suppliers and utilities, and therefore must comply with the highest international risk management and corporate governance standards, while remaining flexible to evolving customer needs.

The adoption of VMware NSX based network virtualization has transformed the way Schuberg Philis runs its IT. In order to provide 100 percent functional up time of its customers’ critical applications, Schuberg Philis continuously optimizes its infrastructure and processes. However, the company increasingly saw its network as a barrier to increasing business agility.

To solve this challenge and to accelerate application roll out, the Schuberg Philis implemented a software-defined data center environment, and deployed VMware NSX. Schuberg Philis is taking advantage of the VMware NSX platform’s flexibility, security and agility to accelerate the deployment of applications to customers. Schuberg Philis customers now have easy access to the flexibility of the cloud, but within a certified, auditable environment, which includes built in controls and security.

Funs Kessen, cloud architect at Schuberg Philis, explained, “The process for spinning up new applications for customers used to take weeks to complete. Now we can do it in a little more than 18 minutes. This allows our customers to respond more quickly to business requirements and opportunities.

By fully automating the process, Kessen and team can offer Schuberg Philis customers complete access to the flexibility of the cloud within a certified environment, complete with all controls and security built in, and we’ve made it fully auditable.”

The adoption of VMware NSX based network virtualization has transformed the way Schuberg Philis runs its IT.

Kessen noted, “With VMware NSX in our software-defined data center, we can focus on applications, and not on the infrastructure,”

Follow Schuberg Philis on Facebook, Twitter, YouTube and Google+

Roger

Talking Tech Series: VMware NSX Edge Scale Out with Equal-Cost Multi-Path Routing

This post was written by Roie Ben Haim and Max Ardica, with a special thanks to Jerome Catrouillet, Michael Haines, Tiran Efrat and Ofir Nissim for their valuable input.

****

The modern data center design is changing, following a shift in the habits of consumers using mobile devices, the number of new applications that appear every day and the rate of end-user browsing which has grown exponentially. Planning a new data center requires meeting certain fundamental design guidelines. The principal goals in data center design are: Scalability, Redundancy and High-bandwidth.

In this blog we will describe the Equal Cost Multi-Path functionality (ECMP) introduced in VMware NSX release 6.1 and discuss how it addresses the requirements of scalability, redundancy and high bandwidth. ECMP has the potential to offer substantial increases in bandwidth by load-balancing traffic over multiple paths as well as providing fault tolerance for failed paths. This is a feature which is available on physical networks but we are now introducing this capability for virtual networking as well. ECMP uses a dynamic routing protocol to learn the next-hop towards a final destination and to converge in case of failures. For a great demo of how this works, you can start by watching this video, which walks you through these capabilities in VMware NSX.

Scalability and Redundancy and ECMP

To keep pace with the growing demand for bandwidth, the data center must meet scale out requirements, which provide the capability for a business or technology to accept increased volume without redesign of the overall infrastructure. The ultimate goal is avoiding the “rip and replace” of the existing physical infrastructure in order to keep up with the growing demands of the applications. Data centers running business critical applications need to achieve near 100 percent uptime. In order to achieve this goal, we need the ability to quickly recover from failures affecting the main core components. Recovery from catastrophic events needs to be transparent to end user experiences.

ECMP with VMware NSX 6.1 allows you to use upto a maximum of 8 ECMP Paths simultaneously. In a specific VMware NSX deployment, those scalability and resilience improvements are applied to the “on-ramp/off-ramp” routing function offered by the Edge Services Gateway (ESG) functional component, which allows communication between the logical networks and the external physical infrastructure.

ECMP Image 1

External user’s traffic arriving from the physical core routers can use up to 8 different paths (E1-E8) to reach the virtual servers (Web, App, DB).

In the same way, traffic returning from the virtual server’s hit the Distributed Logical Router (DLR), which can choose up to 8 different paths to get to the core network.

How is the path determined:

NSX for vSphere Edge Services Gateway device:

When a traffic flow needs to be routed, the round robin algorithm is used to pick up one of the links as the path for all traffic of this flow. The algorithm ensures to keep in order all the packets related to this flow by sending them through the same path. Once the next-hop is selected for a particular Source IP and Destination IP pair, the route cache stores this. Once a path has been chosen, all packets related to this flow will follow the same path.

There is a default IPv4 route cache timeout, which is 300 seconds. If an entry is inactive for this period of time, it is then eligible to be removed from route cache. Note that these settings can be tuned for your environment.

Distributed Logical Router (DLR):

The DLR will choose a path based on a Hashing algorithm of Source IP and Destination IP.

What happens in case of a failure on one of Edge Devices?

In order to work with ECMP the requirement is to use a dynamic routing protocol: OSPF or BGP. If we take OSPF for example, the main factor influencing the traffic outage experience is the tuning of the OSPF timers.

OSPF will send hello messages between neighbors, the OSPF “Hello” protocol is used and determines the Interval as to how often an OSPF Hello is sent.

Another OSPF timer called “Dead” Interval is used, which is how long to wait before we consider an OSPF neighbor as “down”. The OSPF Dead Interval is the main factor that influences the convergence time. Dead Interval is usually 4 times the Hello Interval but the OSPF (and BGP) timers can be set as low as 1 second (for Hello interval) and 3 seconds (for Dead interval) to speed up the traffic recovery.

ECMP Image 2

In the example above, the E1 NSX Edge has a failure; the physical routers and DLR detect E1 as Dead at the expiration of the Dead timer and remove their OSPF neighborship with him. As a consequence, the DLR and the physical router remove the routing table entries that originally pointed to the specific next-hop IP address of the failed ESG.

As a result, all corresponding flows on the affected path are re-hashed through the remaining active units. It’s important to emphasize that network traffic that was forwarded across the non-affected paths remains unaffected.

Troubleshooting and visibility

With ECMP it’s important to have introspection and visibility tools in order to troubleshoot optional point of failure. Let’s look at the following topology.

ECMP Image 3

A user outside our Data Center would like to access the Web Server service inside the Data Center. The user IP address is 192.168.100.86 and the web server IP address is 172.16.10.10.

This User traffic will hit the Physical Router (R1), which has established OSPF adjacencies with E1 and E2 (the Edge devices). As a result R1 will learn how to get to the Web server from both E1 and E2 and will get two different active paths towards 172.16.10.10. R1 will pick one of the paths to forward the traffic to reach the Web server and will advertise the user network subnet 192.168.100.0/24 to both E1 and E2 with OSPF.

E1 and E2 are NSX for vSphere Edge devices that also establish OSPF adjacencies with the DLR. E1 and E2 will learn how to get to the Web server via OSPF control plane communication with the DLR.

From the DLR perspective, it acts as a default gateway for the Web server. This DLR will form an OSPF adjacency with E1 and E2 and have 2 different OSPF routes to reach the user network.

From the DLR we can verify OSPF adjacency with E1, E2.

We can use the command: “show ip ospf neighbor”

 

ECMP Image 4

 

From this output we can see that the DLR has two Edge neighbors: 198.168.100.3 and 192.168.100.10.The next step will be to verify that ECMP is actually working.

We can use the command: “show ip route”

 

ECMP Image 5

 

The output from this command shows that the DLR learned the user network 192.168.100.0/24 via two different paths, one via E1 = 192.168.10.1 and the other via E2 = 192.168.10.10.

Now we want to display all the packets which were captured by an NSX for vSphere Edge interface.

In the example below and in order to display the traffic passing through interface vNic_1, and which is not OSPF protocol control packets, we need to type this command: “debug packet display interface vNic_1 not_ip_proto_ospf”

We can see an example with a ping running from host 192.168.100.86 to host 172.16.10.10

 

ECMP Image 6

 

If we would like to display the captured traffic to a specific ip address 172.16.10.10, the command capture would look like: “debug packet display interface vNic_1 dst_172.16.10.10”

 

ECMP Image 7

 

Useful CLI for Debugging ECMP

  • To check which ECMP path is chosen for a flow
    • debug packet display interface IFNAME
  • To check the ECMP configuration
    • show configuration routing-global
  • To check the routing table
    • show ip route
  • To check the forwarding table
    • show ip forwarding

Useful CLI for Dynamic Routing

  • show ip ospf neighbor
  • show ip ospf database
  • show ip ospf interface
  • show ip bgp neighbors
  • show ip bgp

ECMP Deployment Consideration

ECMP currently implies stateless behavior. This means that there is no support for stateful services such as the Firewall, Load Balancing or NAT on the NSX Edge Services Gateway. The Edge Firewall gets automatically disabled on ESG when ECMP is enabled. In the current NSX 6.1 release, the Edge Firewall and ECMP cannot be turned on at the same time on NSX edge device. Note however, that the Distributed Firewall (DFW) is unaffected by this.

For more in-depth information, you can also read our VMware® NSX for vSphere (NSX-V) Network Virtualization Design Guide

About The Authors

VMware-Roie Ben HaimRoie Ben Haim works as a professional services consultant at VMware, focusing on design and implementation of VMware’s software-defined data center products.  Roie has more than 12 years in data center architecture, with a focus on network and security solutions for global enterprises. An enthusiastic M.Sc. graduate, Roie holds a wide range of industry leading certifications including Cisco CCIE x2 # 22755 (Data Center, CCIE Security), Juniper Networks JNCIE – Service Provider #849, and VMware vExpert 2014, VCP-NV, VCP-DCV. Follow his personal blog at http://roie9876.wordpress.com/

VMware-Max ArdicaMax Ardica is a senior technical product manager in VMware’s networking and security business unit (NSBU). Certified as VCDX #171, his primary task is helping to drive the evolution of the VMware NSX platform, building the VMware NSX architecture and providing validated design guidance for the software-defined data center, specifically focusing on network virtualization. Prior to joining VMware, Max worked for almost 15 years at Cisco, covering different roles, from software development to product management. Max owns also a CCIE certification (#13808).

Automating a Multi-Action Security Workflow with VMware NSX

This post was written by VMware’s John Dias, (VCP-DCV), Sr. Systems Engineer, Cloud Management Solutions Engineering Team, and Hadar Freehling, Security & Compliance Systems Engineer Specialist

***

Through a joint effort with Hadar Freehling, one of my esteemed peers here at VMware, we co-developed a proof-of-concept workflow for a network security use case.  Hadar created a short video showing and explaining the use case, but in summary this is a workflow that reacts to and remediates a security issue flagged by third-party integration with VMware NSX. In the video, TrendMicro is used but it could be any other partner integration with vShield Endpoint.

Here’s what happens:

  • A virus is detected on a VM and is quarantined by the AV solution
  • The AV solution tags the VM with an NSX security tag
  • VMware NSX places the VM in a new Security Group, whose network policies steer all VM traffic through an intrusion prevention system (IPS)
  • vCenter Orchestrator (vCO) monitors the security group for changes and when a VM is added
    • a snapshot of the VM is taken for forensic purposes
    • a vSpan session (RSPAN) is set up on the Distributed Virtual Switch to begin capturing inbound/outbound traffic on the VM
    • once the VM has been removed from the security group, the vSpan session is removed

Watch the video below for a walk-through by Hadar:

You will note that there is a portion of the workflow that is handled natively by VMware NSX (Security Tag reaction, Security Group policy) but the snapshot and RSPAN are done via vCO workflow.

If you are interested in exploring this capability, I have provided the vCO workflow package for download. This is provided as-is and you should fully test it (and modify as needed) before using in your environment.

Assuming you have VMware NSX, vShield Endpoint and some third party integration already set up, you will need the following:

  • vCO 5.5.2
  • The NSX plugin for vCO (installed and configured)
  • The REST plugin with your NSX manager added as a REST host
  • vCenter plugin configured

The workflow package includes a good number of “helper” workflows which you will not need to run directly. The master workflow is in the root folder Security Reaction and is named “Set up VM Forensics RUN THIS” (just in case you had any doubt as to which one to run).

Multi-Site Security

The Security Reaction Master Workflow

Running the master workflow will prompt you for three items:

  • The NSX Security Group to monitor – This is why the NSX plugin is required, so that you can browse the vCO managed objects and locate the desired Security Group.
  • A time to sleep in seconds – The master workflow will run continuously until manually stopped and will use a REST call to NSX to get the current membership for the Security Group.  We have no recommendation on this poll time, although in testing we used 5-10 seconds.  It would have been better to use some external event to kick off the vCO workflow but we could not find a way to do this from NSX.  It may be possible to do via the partner solution, but we wanted this workflow package to be “partner neutral.”
  • Destination IPv4 address – This is the destination for the RSPAN (or vSpan session in vSphere API terms).  The vSpan session is created with some defaults (for example sampling rate, normal traffic allowed, etc).  If you want to change any of those properties, you will need to modify the Helper workflow named “Configure encapRemoteMirrorSource vSpan Session on DVS” (modify the “Create Port Mirror” script task).

Also note that this workflow doesn’t support VMs with multiple vNICs. Specifically, it will only create an RSPAN that includes the first vNIC found on a VM.  You can modify the Helper workflow “Implement Forensics” and adjust the script task “Prep for Mirror Creation” so that the additional NICs (if any) are added to the sourcePorts array. It’s something we intended to fix but forgot about until after our final testing and video production – so as they say in the textbooks “this is left as an exercise for the reader.”

Of course, there are many other actions that can be taken besides setting up an RSPAN and getting a snapshot. This solution can be extended to practically any task required during such an event such as creating a ticket in your service desk software, spinning up additional workloads to replace the compromised VM, sending emails, guest OS file system operations…all of these and more can be accomplished using vCO in conjunction with NSX.

 

Using Differentiated Services to Tame Elephants

This post was co-authored by Justin Pettit, Staff Engineer, Networking & Security Business Unit at VMware, and Ravi Shekhar, Distinguished Engineer, S3BU at Juniper Networks.

********************

As discussed in other blog posts and presentations, long-lived, high-bandwidth flows (elephants) can negatively affect short-lived flows (mice). Elephant flows send more data, which can lead to queuing delays for latency-sensitive mice.

VMware demonstrated the ability to use a central controller to manage all the forwarding elements in the underlay when elephant flows are detected.  In environments that do not have an SDN-controlled fabric, an alternate approach is needed.  Ideally, the edge can identify elephants in such a way that the fabric can use existing mechanisms to treat mice and elephants differently.

Differentiated services (diffserv) were introduced to bring scalable service discrimination to IP traffic. This is done using Differentiated Services Code Point (DSCP) bits in the IP header to signal different classes of service (CoS). There is wide support in network fabrics to treat traffic differently based on the DSCP value.

A modified version of Open vSwitch allows us to identify elephant flows and mark the DSCP value of the outer IP header.  The fabric is then configured to handle packets with the “elephant” DSCP value differently from the mice.

Figure 1: Elephants are detected at the edge of the network and signaled to the fabric through DSCP.  Based on these code points, the fabric can treat elephant traffic differently from mice

Figure 1: Elephants are detected at the edge of the network and signaled to the fabric through DSCP. Based on these code points, the fabric can treat elephant traffic differently from mice

Detecting and Marking Elephants with Open vSwitch

Open vSwitch’s location at the edge of the network gives it visibility into every packet in and out of each guest.  As such, the vSwitch is in the ideal location to make per-flow decisions such as elephant flow detection. Because environments are different, our approach provides multiple detection mechanisms and actions so that they can be used and evolve independently.

An obvious approach to detection is to just keep track of how many bytes each flow has generated.  By this definition, if a flow has sent a large amount of data, it is an elephant. In Open vSwitch, the number of bytes and an optional duration can be configured. By using a duration, we can ensure that we don’t classify very short-lived flows as elephants. We can also avoid identifying low-bandwidth but long-lived flows as elephants.

An alternate approach looks at the size of the packet that is being given to the NIC.  Most NICs today support TCP Segmentation Offload (TSO), which allows the transmitter (e.g., the guest) to give the NIC TCP segments up to 64KB, which the NIC chops into MSS-sized packets to be placed on the wire.

Because of TCP’s slow start, the transmitter does not immediately begin sending maximum-sized packets to the NIC.  Due to our unique location, we can see the TCP window as it opens, and tag elephants earlier and more definitively. This is not possible at the top-of-rack (TOR) or anywhere else in the fabric, since they only see the segmented version of the traffic.

Open vSwitch may be configured to track all flows with packets of a specified size. For example, by looking for only packets larger than 32KB (which is much larger than jumbo frames), we know the transmitter is out of slow-start and making use of TSO. There is also an optional count, which will trigger when the configured number of packets with the specified size is seen.

Some new networking hardware provides some elephant flow mitigation by giving higher priority to small flows. This is achieved by tracking all flows and placing new flows in a special high-priority queue. When the number of packets in the flow has crossed a threshold, the flow’s packets from then on are placed into the standard priority queue.

This same effect can be achieved using the modified Open vSwitch and a standard fabric.  For example, by choosing a packet size of zero and threshold of ten packets, each flow will be tracked in a hash table in the kernel and tagged with the configured DSCP value when that flow has generated ten packets.  Whether mice are given a high priority or elephants are given a low priority, the same effect is achieved without the need to replace the entire fabric.

Handling Elephants with Juniper Devices

Juniper TOR devices (such as QFX5100) and aggregation devices (such as MX, EX9200) provide a rich diffserv model CoS to to achieve these goals in the underlay.  These include:

  • Elaborate controls for packet admittance with dedicated and shared limits. Dedicated limits provide a minimum service guarantee, and shared limits allow statistical sharing of buffers across different ports and priorities.
  • A large number of flexibly assigned queues; up to 2960 unicast queues at the TOR and 512K at the aggregation device.
  • Enhanced and varied scheduling methods to drain these queues: strict and round-robin scheduling with up to 4-levels of hierarchical schedulers.
  • Shaping and metering to control the rate of injection of traffic from different queues of a TOR in the underlay network. By doing this, bursty traffic at the edge of the physical network can be leveled out before it reaches the more centrally shared aggregation devices.
  • Sophisticated controls to detect and notify congestion, and set drop thresholds. These mechanisms detect possible congestion in the network sooner and notify the source to slow down (e.g. using ECN).

With this level of flexibility, it is possible to configure these devices to:

  • Enforce minimum bandwidth allocation for mice flows and/or maximum bandwidth allocation for elephant flows on a shared link.
  • When experiencing congestion, drop (or ECN mark) packets of elephant flows more aggressively than mice flows.  This will result in TCP connections of elephant flows to back off sooner, which alleviates congestion in the network.
  • Take a different forwarding path for elephant flows from that of mice flows.  For example, a TOR can forward elephant flows towards aggregation switches with big buffers and spread mice flows towards multiple aggregation switches that support low-latency forwarding.

Conclusion

By inserting some intelligence at the edge and using diffserv, network operators can use their existing fabric to differentiate between elephant flows and mice. Most networking gear provides some capabilities, and Juniper, in particular, provides a rich set of operations that can be used based on the DSCP.  Thus, it is possible to reduce the impact of heavy hitters without the need to replace hardware. Decoupling detection from mitigation allows each to evolve independently without requiring wholesale hardware upgrades.