Nemtallah Daher is Senior Network Delivery Consultant at the consulting firm AdvizeX Technology. Recently he took some time out of his day to talk with us about why, as a networking guy, he thinks learning about network virtualization is critical to further one’s career.
I’ve been at AdvizeX for about a year now. I do Cisco, HP, data center stuff, and all sorts of general networking things: routing, switching, data center, UCS. That kind of stuff. Before coming to AdvizeX, I was a senior network specialist at Cleveland State University for about 20 years.
I started at Cleveland State in 1988 as a systems programmer, working on IBM mainframe doing CICS, COBOL and assembler. About 2 years after I started at Cleveland State, networking was becoming prevalent, and the project I was working on was coming to an end, so they asked me if I would help start a networking group. So from a small lab here, a building here, a floor there, I built the network at Cleveland State. We applied for a grant to get some hardware, applied for an IP address, domain name, all these things. There was nothing at the time, so we did everything. We incorporated wireless about 10 years in. Over time it became a ubiquitous, campus-wide network. So that’s my brief history. Continue reading →
As VMware NSX gains broader adoption, we have heard many customer requests for guidance to help them run NSX on top of the latest Cisco infrastructure, namely Cisco UCS and Nexus 9000 series switches.
With customers choosing the benefits of VMware NSX along with the Software Defined Data Center (SDDC), the underlying hardware (Ethernet fabric, x86 compute, etc) provides reliable, resilient capacity, but the configuration, state and advanced features move to faster, more flexible software. The requests were for deploying NSX with Cisco infrastructure running in a standard IP-based fabric with the Nexus 9000’s in standalone mode (NX-OS Mode), as opposed to the proprietary ACI Mode. As with any IP fabric, VMware NSX works great with Nexus 9000 as the underlay. The combination of VMware NSX and Nexus 9000 in standalone mode enables the benefits customers have chosen to embrace with the SDDC.
Ron Flax is the Vice President of August Schell, a reseller of VMware products and IT services company that specializes in delivering services to commercial accounts and the federal government, particularly intelligence and U.S. Department of Defense. Ron is a VCDX-NV certified network virtualization professional and a VMware vExpert. We spoke with Ron about network virtualization and the NSX career path.
The most exciting thing about network virtualization, I think, is the transformative nature of this technology. Networks have been built the same way for the last 20 to 25 years. Nothing has really changed. A lot of new features have been built, a lot of different technologies have come around networks, but the fundamental nature of how networks are built has not changed. But VMware NSX, because it’s a software-based product, has completely altered everything. It enables a much more agile approach to networks: the ability to automate the stand-up and tear-down of networks; the ability to produce firewalling literally at the virtual network interface. And because things are done at software speed, you can now make changes to the features and functions of networking products at software speed. You no longer have to deal with silicon speed. It’s very, very exciting. With a software-based approach, you can just do so much more in such a small amount of time.
What we’re hearing from customers, at this point, is that they’re very interested to learn more. They’re at a phase where they’re ready to get their hands dirty, and they really want to understand it better. What’s driving a lot of adoption today is security, it is our foot in the door. When you speak with customers about the security aspects, the micro-segmentation capabilities, you may not even have to get to a virtual network discussion. Once you get the security aspect deployed, customers will see it in action and then a few weeks later will say, ‘Hey, you know, can you show me how the new router works?’ or ‘Can you show me how other features of NSX work?’ That’s when you can start to broaden your approach. So these compelling security stories like micro-segmentation or distributed firewalling get you in and get the deployment started, but ultimately it’s the flexibility of being able to deliver networks at speed, in an agile way, through software, through automation, that’s the home run. Continue reading →
Chris Miller is the principal architect for AdvizeX in Columbus OH. He runs the NSX program from a technical and marketing perspective, including enterprise pre-sales support and go-to-market strategies.
I started my career as a traditional Cisco networking guy. I spent 10 to 15 years as a network architect. But I’d been tracking what was going on in the community, with Open Flow and some of the other technologies. When I saw what VMware was doing, it got me pretty excited. I thought, ’It’s pretty revolutionary what’s going on here.’ I immediately jumped on the opportunity to take part in NSX.
In terms of enterprise customers, we weren't initially seeing a lot of adoption in the market. Then VMware announced the Nicira acquisition, and Cisco announced what they were going to do with ACI, and heads started turning. I realized, you know, here are two of our largest partners putting their investment dollars behind this technology. And then, when I saw what NSX could do, and the benefits it could bring, it was very clear to me that this was the next wave. Continue reading →
Last month, we outlined VMware’s vision for helping customers achieve one cloud for any application and any device. We believe the prevailing model for cloud adoption will be the hybrid cloud, and the best architecture for achieving the hybrid cloud is through a software-defined data center architecture. The fastest path to building reliable infrastructure for the hybrid cloud is through the use of converged infrastructure systems, and no company has been more successful at delivering on the promise of converged infrastructure than our partner VCE.
Now, the ability to procure and deploy the VMware NSX network virtualization platform with VCE converged infrastructure is about to get whole lot easier.
Today, VCE launched VCE VxBlock Systems, a new family of converged infrastructure systems that will factory-integrate VMware NSX for software-defined data center deployments. The new VxBlock Systems will include VCE pre-integration, pre-testing and pre-validation of VMware NSX, with seamless component-level updates, ongoing lifecycle assurance, and unified single-call support from VCE.
As I wrote previously, VMware NSX already runs great on existing Vblock Systems. Customers today are deploying VMware NSX with their existing Vblocks, and customers will be able to extend VMware NSX environments across their entire VCE converged infrastructure environment as they move to the new VxBlock Systems.
This solution will be a powerful building block for the software-defined data center, delivering unparalleled IT agility through automation, and unparalleled security through micro-segmentation.
Agility through IT Automation
Reduce time to provision multi-tier networking and security services from weeks to minutes.
Achieve faster development, testing and deployment of new applications by aligning network and security provisioning with compute and storage provisioning.
Streamline IT operations through programmatic creation, provisioning, snapshotting, deleting and restoration of complex software-based networks.
Build advanced workflows through cloud management platforms to automate provisioning of networking and security, including switching, routing, firewalling, and load balancing without manually reconfiguring physical network devices.
Use micro-segmentation and isolation capabilities of VMware NSX to build security directly into the data center infrastructure.
Insert advanced partner services from leading security vendors to improve threat protection, reduce risk and help address their compliance requirements.
Achieve better security inside the data center through fine-grained policies that enable firewall controls and advanced security down to the level of the virtual NIC.
Create dynamic security policies that are automatically applied when a virtual machine spins up, are moved when a virtual machine is migrated and are removed when a virtual machine is de-provisioned
VMware NSX is the ideal platform for virtualizing the network running on top of VCE converged infrastructure.
VMware Partner Exchange (PEX) is your one-stop shop when it comes to learning about network virtualization and the technology extends VMware’s vision of the software-defined data center. At this year’s event, we are offering both an executive track and a technical track to help partners build their businesses and advance their knowledge, as you take customers on the path to Virtualizing the Network.
If you are a partner that is new to network virtualization, we have a program/learning path where you can send two people to PEX and to achieve their network virtualization competency by attending the 3-Day NSX Install, Configure and Manage Boot Camp prior to the start of the conference. Participants can then attend the free instructor-led VSP-NV and VTSP-NV boot camps during the conference. Continue reading →
Last week we hosted the Open vSwitch 2014 Fall Conference, which was another great opportunity to demonstrate our continued investment in leading open source technologies. To get a sense of the energy and enthusiasm at the event, take a quick view of this video we captured with attendees.
I’ve been thinking about the key takeaways from everything I saw and everyone I spoke with.
First, there's huge interest in Open vSwitch performance, both in terms of measurement and improvement. The talks from Rackspace and Noiro Networks/Cisco led me to believe that we've reached the point where Open vSwitch performance is good enough on hypervisors for most applications, and often faster than competing software solutions such as the Linux bridge.
Talks from Intel and one from Luigi Rizzo at the University of Pisa demonstrated that by bypassing the kernel entirely through DPDK or netmap, respectively, we haven’t reached the limits of software forwarding performance. Based on a conversation I had with Chris Wright from Red Hat, this work is helping the Linux kernel community look into reducing the overhead of the kernel, so that we can see improved performance without losing the functionality provided by the kernel.
Johann Tönsing from Netronome also presented a talk describing all the ways that Netronome's NPU hardware can accelerate OpenFlow and Open vSwitch; I've talked to Johann many times before, but I had never realized how many different configurations their hardware supports, so this was an eye-opening talk for me.
Next, enhancing Open vSwitch capabilities at L4 through L7 is another exciting area. Our own Justin Pettit was joined by Thomas Graf from Noiro to talk about the ongoing project to add support for NAT and tracking L4 connections, which is key to making Open vSwitch capable of implementing high-quality firewalls. A later talk by Franck Baudin from Qosmos presented L7 enhancements to this capability.
The final area that I saw highlighted at the conference is existing applications for Open vSwitch today. Peter Phaal from InMon, for example, demonstrated applications for sFlow in Open vSwitch. I found his talk interesting because although I knew about sFlow and had talked to Peter before, I hadn't realized all of the varied uses for sFlow monitoring data. Vikram Dham also showed his uses for MPLS in Open vSwitch and Radhika Hirannaiah her use case for OpenFlow and Open vSwitch in traffic engineering.
I want to thank all of our participants and the organizing committee for helping to put together such an amazing event.
This week, VMware will be hosting the Open vSwitch 2014 Fall Conference, with more than 200 attendees and nearly two dozen talks on a variety of subjects from a key participants. The full schedule is available here, and we'll be doing a wrap up of some of the takeaways from the conference a bit later.
For the uninitiated, Open vSwitch is a production quality, multilayer virtual switch licensed under the open source Apache 2.0 license. It is designed to enable massive network automation through programmatic extension, while still supporting standard management interfaces and protocols (e.g. NetFlow, sFlow, IPFIX, RSPAN, CLI, LACP, 802.1ag). In addition, it is designed to support distribution across multiple physical servers similar to VMware's vDS or Cisco's Nexus 1000V. See full feature list here
For more information on OVS, I encourage you to check out the OVS website.
In the mean time, take a read about latest Open vSwitch developments in this post on Network Heresy by OVS core contributors Justin Pettit, Ben Pfaff, and Ethan Jackson.
This post was written by Roie Ben Haim and Max Ardica, with a special thanks to Jerome Catrouillet, Michael Haines, Tiran Efrat and Ofir Nissim for their valuable input.
The modern data center design is changing, following a shift in the habits of consumers using mobile devices, the number of new applications that appear every day and the rate of end-user browsing which has grown exponentially. Planning a new data center requires meeting certain fundamental design guidelines. The principal goals in data center design are: Scalability, Redundancy and High-bandwidth.
In this blog we will describe the Equal Cost Multi-Path functionality (ECMP) introduced in VMware NSX release 6.1 and discuss how it addresses the requirements of scalability, redundancy and high bandwidth. ECMP has the potential to offer substantial increases in bandwidth by load-balancing traffic over multiple paths as well as providing fault tolerance for failed paths. This is a feature which is available on physical networks but we are now introducing this capability for virtual networking as well. ECMP uses a dynamic routing protocol to learn the next-hop towards a final destination and to converge in case of failures. For a great demo of how this works, you can start by watching this video, which walks you through these capabilities in VMware NSX.
Scalability and Redundancy and ECMP
To keep pace with the growing demand for bandwidth, the data center must meet scale out requirements, which provide the capability for a business or technology to accept increased volume without redesign of the overall infrastructure. The ultimate goal is avoiding the “rip and replace” of the existing physical infrastructure in order to keep up with the growing demands of the applications. Data centers running business critical applications need to achieve near 100 percent uptime. In order to achieve this goal, we need the ability to quickly recover from failures affecting the main core components. Recovery from catastrophic events needs to be transparent to end user experiences.
ECMP with VMware NSX 6.1 allows you to use upto a maximum of 8 ECMP Paths simultaneously. In a specific VMware NSX deployment, those scalability and resilience improvements are applied to the “on-ramp/off-ramp” routing function offered by the Edge Services Gateway (ESG) functional component, which allows communication between the logical networks and the external physical infrastructure.
External user’s traffic arriving from the physical core routers can use up to 8 different paths (E1-E8) to reach the virtual servers (Web, App, DB).
In the same way, traffic returning from the virtual server’s hit the Distributed Logical Router (DLR), which can choose up to 8 different paths to get to the core network.
How is the path determined:
NSX for vSphere Edge Services Gateway device:
When a traffic flow needs to be routed, the round robin algorithm is used to pick up one of the links as the path for all traffic of this flow. The algorithm ensures to keep in order all the packets related to this flow by sending them through the same path. Once the next-hop is selected for a particular Source IP and Destination IP pair, the route cache stores this. Once a path has been chosen, all packets related to this flow will follow the same path.
There is a default IPv4 route cache timeout, which is 300 seconds. If an entry is inactive for this period of time, it is then eligible to be removed from route cache. Note that these settings can be tuned for your environment.
Distributed Logical Router (DLR):
The DLR will choose a path based on a Hashing algorithm of Source IP and Destination IP.
What happens in case of a failure on one of Edge Devices?
In order to work with ECMP the requirement is to use a dynamic routing protocol: OSPF or BGP. If we take OSPF for example, the main factor influencing the traffic outage experience is the tuning of the OSPF timers.
OSPF will send hello messages between neighbors, the OSPF “Hello” protocol is used and determines the Interval as to how often an OSPF Hello is sent.
Another OSPF timer called “Dead” Interval is used, which is how long to wait before we consider an OSPF neighbor as "down". The OSPF Dead Interval is the main factor that influences the convergence time. Dead Interval is usually 4 times the Hello Interval but the OSPF (and BGP) timers can be set as low as 1 second (for Hello interval) and 3 seconds (for Dead interval) to speed up the traffic recovery.
In the example above, the E1 NSX Edge has a failure; the physical routers and DLR detect E1 as Dead at the expiration of the Dead timer and remove their OSPF neighborship with him. As a consequence, the DLR and the physical router remove the routing table entries that originally pointed to the specific next-hop IP address of the failed ESG.
As a result, all corresponding flows on the affected path are re-hashed through the remaining active units. It’s important to emphasize that network traffic that was forwarded across the non-affected paths remains unaffected.
Troubleshooting and visibility
With ECMP it’s important to have introspection and visibility tools in order to troubleshoot optional point of failure. Let’s look at the following topology.
A user outside our Data Center would like to access the Web Server service inside the Data Center. The user IP address is 192.168.100.86 and the web server IP address is 172.16.10.10.
This User traffic will hit the Physical Router (R1), which has established OSPF adjacencies with E1 and E2 (the Edge devices). As a result R1 will learn how to get to the Web server from both E1 and E2 and will get two different active paths towards 172.16.10.10. R1 will pick one of the paths to forward the traffic to reach the Web server and will advertise the user network subnet 192.168.100.0/24 to both E1 and E2 with OSPF.
E1 and E2 are NSX for vSphere Edge devices that also establish OSPF adjacencies with the DLR. E1 and E2 will learn how to get to the Web server via OSPF control plane communication with the DLR.
From the DLR perspective, it acts as a default gateway for the Web server. This DLR will form an OSPF adjacency with E1 and E2 and have 2 different OSPF routes to reach the user network.
From the DLR we can verify OSPF adjacency with E1, E2.
We can use the command: “show ip ospf neighbor”
From this output we can see that the DLR has two Edge neighbors: 220.127.116.11 and 192.168.100.10.The next step will be to verify that ECMP is actually working.
We can use the command: “show ip route”
The output from this command shows that the DLR learned the user network 192.168.100.0/24 via two different paths, one via E1 = 192.168.10.1 and the other via E2 = 192.168.10.10.
Now we want to display all the packets which were captured by an NSX for vSphere Edge interface.
In the example below and in order to display the traffic passing through interface vNic_1, and which is not OSPF protocol control packets, we need to type this command: “debug packet display interface vNic_1 not_ip_proto_ospf”
We can see an example with a ping running from host 192.168.100.86 to host 172.16.10.10
If we would like to display the captured traffic to a specific ip address 172.16.10.10, the command capture would look like: “debug packet display interface vNic_1 dst_172.16.10.10”
Useful CLI for Debugging ECMP
To check which ECMP path is chosen for a flow
debug packet display interface IFNAME
To check the ECMP configuration
show configuration routing-global
To check the routing table
show ip route
To check the forwarding table
show ip forwarding
Useful CLI for Dynamic Routing
show ip ospf neighbor
show ip ospf database
show ip ospf interface
show ip bgp neighbors
show ip bgp
ECMP Deployment Consideration
ECMP currently implies stateless behavior. This means that there is no support for stateful services such as the Firewall, Load Balancing or NAT on the NSX Edge Services Gateway. The Edge Firewall gets automatically disabled on ESG when ECMP is enabled. In the current NSX 6.1 release, the Edge Firewall and ECMP cannot be turned on at the same time on NSX edge device. Note however, that the Distributed Firewall (DFW) is unaffected by this.
Roie Ben Haim works as a professional services consultant at VMware, focusing on design and implementation of VMware’s software-defined data center products. Roie has more than 12 years in data center architecture, with a focus on network and security solutions for global enterprises. An enthusiastic M.Sc. graduate, Roie holds a wide range of industry leading certifications including Cisco CCIE x2 # 22755 (Data Center, CCIE Security), Juniper Networks JNCIE - Service Provider #849, and VMware vExpert 2014, VCP-NV, VCP-DCV. Follow his personal blog at http://roie9876.wordpress.com/
Max Ardica is a senior technical product manager in VMware’s networking and security business unit (NSBU). Certified as VCDX #171, his primary task is helping to drive the evolution of the VMware NSX platform, building the VMware NSX architecture and providing validated design guidance for the software-defined data center, specifically focusing on network virtualization. Prior to joining VMware, Max worked for almost 15 years at Cisco, covering different roles, from software development to product management. Max owns also a CCIE certification (#13808).
This post was co-authored by Justin Pettit, Staff Engineer, Networking & Security Business Unit at VMware, and Ravi Shekhar, Distinguished Engineer, S3BU at Juniper Networks.
As discussed in other blog posts and presentations, long-lived, high-bandwidth flows (elephants) can negatively affect short-lived flows (mice). Elephant flows send more data, which can lead to queuing delays for latency-sensitive mice.
VMware demonstrated the ability to use a central controller to manage all the forwarding elements in the underlay when elephant flows are detected. In environments that do not have an SDN-controlled fabric, an alternate approach is needed. Ideally, the edge can identify elephants in such a way that the fabric can use existing mechanisms to treat mice and elephants differently.
Differentiated services (diffserv) were introduced to bring scalable service discrimination to IP traffic. This is done using Differentiated Services Code Point (DSCP) bits in the IP header to signal different classes of service (CoS). There is wide support in network fabrics to treat traffic differently based on the DSCP value.
A modified version of Open vSwitch allows us to identify elephant flows and mark the DSCP value of the outer IP header. The fabric is then configured to handle packets with the “elephant” DSCP value differently from the mice.
Figure 1: Elephants are detected at the edge of the network and signaled to the fabric through DSCP. Based on these code points, the fabric can treat elephant traffic differently from mice
Detecting and Marking Elephants with Open vSwitch
Open vSwitch’s location at the edge of the network gives it visibility into every packet in and out of each guest. As such, the vSwitch is in the ideal location to make per-flow decisions such as elephant flow detection. Because environments are different, our approach provides multiple detection mechanisms and actions so that they can be used and evolve independently.
An obvious approach to detection is to just keep track of how many bytes each flow has generated. By this definition, if a flow has sent a large amount of data, it is an elephant. In Open vSwitch, the number of bytes and an optional duration can be configured. By using a duration, we can ensure that we don’t classify very short-lived flows as elephants. We can also avoid identifying low-bandwidth but long-lived flows as elephants.
An alternate approach looks at the size of the packet that is being given to the NIC. Most NICs today support TCP Segmentation Offload (TSO), which allows the transmitter (e.g., the guest) to give the NIC TCP segments up to 64KB, which the NIC chops into MSS-sized packets to be placed on the wire.
Because of TCP’s slow start, the transmitter does not immediately begin sending maximum-sized packets to the NIC. Due to our unique location, we can see the TCP window as it opens, and tag elephants earlier and more definitively. This is not possible at the top-of-rack (TOR) or anywhere else in the fabric, since they only see the segmented version of the traffic.
Open vSwitch may be configured to track all flows with packets of a specified size. For example, by looking for only packets larger than 32KB (which is much larger than jumbo frames), we know the transmitter is out of slow-start and making use of TSO. There is also an optional count, which will trigger when the configured number of packets with the specified size is seen.
Some new networking hardware provides some elephant flow mitigation by giving higher priority to small flows. This is achieved by tracking all flows and placing new flows in a special high-priority queue. When the number of packets in the flow has crossed a threshold, the flow’s packets from then on are placed into the standard priority queue.
This same effect can be achieved using the modified Open vSwitch and a standard fabric. For example, by choosing a packet size of zero and threshold of ten packets, each flow will be tracked in a hash table in the kernel and tagged with the configured DSCP value when that flow has generated ten packets. Whether mice are given a high priority or elephants are given a low priority, the same effect is achieved without the need to replace the entire fabric.
Handling Elephants with Juniper Devices
Juniper TOR devices (such as QFX5100) and aggregation devices (such as MX, EX9200) provide a rich diffserv model CoS to to achieve these goals in the underlay. These include:
Elaborate controls for packet admittance with dedicated and shared limits. Dedicated limits provide a minimum service guarantee, and shared limits allow statistical sharing of buffers across different ports and priorities.
A large number of flexibly assigned queues; up to 2960 unicast queues at the TOR and 512K at the aggregation device.
Enhanced and varied scheduling methods to drain these queues: strict and round-robin scheduling with up to 4-levels of hierarchical schedulers.
Shaping and metering to control the rate of injection of traffic from different queues of a TOR in the underlay network. By doing this, bursty traffic at the edge of the physical network can be leveled out before it reaches the more centrally shared aggregation devices.
Sophisticated controls to detect and notify congestion, and set drop thresholds. These mechanisms detect possible congestion in the network sooner and notify the source to slow down (e.g. using ECN).
With this level of flexibility, it is possible to configure these devices to:
Enforce minimum bandwidth allocation for mice flows and/or maximum bandwidth allocation for elephant flows on a shared link.
When experiencing congestion, drop (or ECN mark) packets of elephant flows more aggressively than mice flows. This will result in TCP connections of elephant flows to back off sooner, which alleviates congestion in the network.
Take a different forwarding path for elephant flows from that of mice flows. For example, a TOR can forward elephant flows towards aggregation switches with big buffers and spread mice flows towards multiple aggregation switches that support low-latency forwarding.
By inserting some intelligence at the edge and using diffserv, network operators can use their existing fabric to differentiate between elephant flows and mice. Most networking gear provides some capabilities, and Juniper, in particular, provides a rich set of operations that can be used based on the DSCP. Thus, it is possible to reduce the impact of heavy hitters without the need to replace hardware. Decoupling detection from mitigation allows each to evolve independently without requiring wholesale hardware upgrades.