VMware NSX Use Case – Simplifying Disaster Recovery (Part 1)

Nicolas Vermandé (VCDX#055) is practice lead for Private Cloud & Infrastructure  at Kelway, a VMware partner. Nicolas covers the Software-Defined Data Center on his blog www.my-sddc.om,

This series of posts describes a specific use case for VMware NSX in the context of Disaster Recovery. The goal is to demonstrate the routing and programmability capabilities through a lab scenario. This first part presents the NSX components and details the use case. The second part will show how to deploy the lab and the third part will deal with APIs and show how to use python to execute REST API calls to recreate the required NSX components at the recovery site.

Introduction

When considering dual datacenter strategy with VMs recovery in mind, one important decision is whether to adopt an active/active or active/standby model. The former is generally much more complex to manage because it requires double the work in terms of procedures, testing and change controls. In addition, capacity management becomes challenging as you need to accommodate physical resources to be able to to run all workloads within whatever site. On top of that, stretched VLANs are sometimes deployed across datacenters so that recovered VMs can keep their IP addresses. This is generally very costly if you want to leverage proper L2 extension technology, such as Cisco OTV.

Alternatively, in a SDDC environment, you can leverage VMware NSX to efficiently manage connectivity and network changes required in the event of a full site failover. NSX gives you the ability to maintain the same IP address scheme for all you workloads by leveraging APIs, with little effort. Or with more granularity, you could even move a single subnet as part of a specific recovery plan. NSX will make this possible by providing:

  • An overlay network that allows you to decouple the backend VM network from the physical network. NSX-V is using VXLAN, each ESXi host acting as a VTEP.
  • Programmability through RESTful APIs that allows you to provision Logical Switches and modify Logical Routers configuration in seconds.
  • Dynamic routing protocol (OSPF, IS-IS, BGP) that will advertise VM subnets to your enterprise network, making them accessible for end users or other applications (North-South or East-West traffic)

NSX Components

As many NSX introduction blog posts can be found on the web (like here or here), I’m not gonna spend much time on this topic. NSX components are:

  • NSX Manager: it’s the single point of configuration and the REST API (and UI) interface. It is provided as a VM appliance and is actually the only appliance you have to manually install. There is a 1:1 mapping between the vCenter Server an the NSX Manager. The manager is responsible for deploying NSX Controllers, NSX Edge Gateways and Logical Router Controllers. It also installs the Distributed Routing and the firewall kernel modules on ESXi hosts, as well as the User World Agent (UWA). NSX configuration is accessible through vCenter once you’ve installed the NSX plugin.
  •  NSX Controller: it provides a control plane to distribute VXLAN Logical Routing and Switching network information to ESXi hosts. It also enables ARP suppression to reduce flooding. It is typically implemented as a 3-node cluster and maintains MAC, ARP and VTEP tables. It is finally responsible for installing routes into each ESXi host.
  • Logical Switch (LS): it acts as the L2 domain boundary for VMs, identified by a VXLAN ID (VNI) and associated with a specific subnet. Its vCenter representation is a distributed Portgroup with specific capabilities.
  • Distributed Logical Router (DLR): it’s the distributed L3 first-hop for VM traffic. As its name suggests, it’s completely distributed. You can think about it as an anycast gateway, where each ESXi corresponds to a node, sharing a single virtual IP and virtual MAC address. The data-path routing process runs within each ESXi in vmkernel space and enables East-West traffic optimisation, avoiding well-known hair-pinning effects when VMs want to talk to their default gateway.
  • Logical Router Control VM: it provides the DLR with a control plane and can be deployed as a redundant pair of VM appliances, in an active/standby fashion. It supports both OSPF and BGP as dynamic routing protocols. The Control VM receives its initial configuration from the NSX Manager.
  • Edge Services Gateway (ESG): it provides network perimeter services to the virtual environment. It is intended for North-South communication, i.e. between the physical and the virtual network or at the edge of your tenant. It is NOT distributed, meaning that its placement is critical. It can run in HA-mode, where the appliances are deployed in an active/standby fashion. The HA mechanism doesn’t rely on VMware HA (as some people at Cisco seem to think), but with minimum common sense, you’re gonna create a DRS anti-affinity rule to separate active and stanby VMs. Depending on specific requirements, the edge gateway can be deployed with several sizes:
    • Compact (1vCPU – 512MB RAM)
    • Large (2 vCPUs – 1GB RAM)
    • Quad-Large (4 vCPUs – 1GB RAM)
    • X-Large (6 vCPUs – 8GB RAM).

Available services include: Firewall, NAT, DHCP, Routing, Load-Balancing, Site-to-Site VPN, SSL VPN and L2VPN.

  • Distributed Firewall (DFW): It enables distributed security capabilities at VM NIC level as an East-West L2-L4 stateful firewall. The module is present on each ESXi host as a kernel module and therefore removes any form of bottleneck. If you need more bandwidth, just add a new host! It also includes the Service Composer feature, which allows you to create specific services by integrating additional 3rd party capabilities to the firewall, such as endpoint services (e.g. Anitivirus, Data Security) and deep packet inspection (Palo Alto). I have to say that this feature is one of the most compelling to me!

The following picture shows how those components fit together:

togetherNSX

Basic Understanding

To understand NSX concepts, it’s useful to map vSphere network components to NSX components:

In a traditional vSphere environment, a VM wishing to communicate with the outside world first hits a virtual port on the virtual switch. This virtual port is part of a Portgroup, which is basically a group of virtual network ports tagged with a specific VLAN ID. In the NSX world, when a VM is part of a Logical Switch, it hits a virtual port member of a Portgroup specifically created by the NSX Manager. It is created on every host member of the VDS, like a traditional distributed Portgroup. However, the difference is that all egress frames hitting this Portgroup will be forwarded inside a VXLAN tunnel, tagged with a specific external VLAN ID to transport the VXLAN frames on the physical network .

The role of the Logical Router is to connect two or more Logical Switches together, enabling routing between the corresponding subnets (you can assume 1 LS = 1 subnet). It also advertises (and learns) prefixes and routes to its neighbor(s) if a dynamic routing protocol has been activated. Alternatively, you can also configure static routes.

As an example, the following diagram shows the DLR establishing adjacency with the ESG, which is also running a dynamic routing protocol, and advertises VM subnets to the physical world. The ESG has its internal interface connected to a VXLAN and its uplink connected to a VLAN. As a result, the physical network can learn about the virtual network, and vice-versa.

example

Lab Architecture

Now that you’ve had a basic introduction to NSX principles, I can detail my scenario. In my lab environment, I’ve simulated the following architecture:

globalNSX

I didn’t actually deploy two sets of controllers and two managers linked to two different Virtual Centers in separate physical datacenters. Instead I’ve created logical containers called “Transport Zones” to make both virtual datacenters completely independent from a data-plane standpoint. The goal here is to demonstrate how to integrate virtual network operations into an orchestrated Disaster Recovery Plan with NSX. The only requirement is the ability to run a script as part of your DR procedures. This may be ultimately be achieved by VMware Site Recovery Manager, or another orchestration tool.

This architecture represents a traditional dual datacenter environment, connected over a L3 IP cloud. In a standard network environment, it basically means that you have to change VMs IP addresses upon recovery. (There are other alternatives, like host routes, RHI and NAT, but these solutions come at a certain complexity cost).

The main goal of the scenario is to show how to provide a flexible orchestrated Disaster Recovery solution without having to change VMs IP addresses. Let’s see how we can achieve this with NSX. The order of operations would be:

  1. Disconnect LS1 and LS2 in DC1.
  2. Create new LS in DC2: DR_LS1 and DR_LS2 (Or pre-create them without connecting the upstream DLR).
  3. Add two new interfaces to DLR2 in DC2, with the same IP addresses as previously used by DLR1 to connect LS1 and LS2. In this way, we don’t have to change the default gateway of the recovered VMs.
  4. Connect those interfaces to the corresponding LS.
  5. Recover VMs in DC2.
  6. Connect VMs to the appropriate LS.
  7. Boot VMs and test connectivity.
  8. Check route updates on the physical network.

Note: I’m assuming here that security devices configuration are synchronized between datacenters.

NSX

Because OSPF is running within the virtual network on both DLR1 and DLR2, routing updates will be sent up to the IP cloud to reflect that DR_LS1 and DR_LS2 subnets are now reachable through DC2. In the same way, because LS1 and LS2 have been disconnected from DLR1, corresponding routes will be removed to reflect that LS1 and LS2 subnets are not reachable in DC1 anymore. Magic??!! No, just awesome technology :-)

The next post will focus on how to deploy this lab environment.

Nicolas

 

Physical Networks in the Virtualized Networking World

[This post was co-authored by VMware's Bruce Davie and Ken Duda from Arista Networks, and originally appeared on Network Heresy]

Almost a year ago, we wrote a first post about our efforts to build virtual networks that span both virtual and physical resources. As we’ve moved beyond the first proofs of concept to customer trials for our combined solution, this post serves to provide an update on where we see the interaction between virtual and physical worlds heading.

Our overall approach to connecting physical and virtual resources can be viewed in two main categories:

  • terminating the overlay on physical devices, such as top-of-rack switches, routers, appliances, etc.
  • managing interactions between the overlay and the physical devices that provide the underlay.

The latter topic is something we’ve addressed in some other recent posts (herehere and here) — in this blog we’ll focus more on how we deal with physical devices at the edge of the overlay. Continue reading

Geneve, VXLAN, and Network Virtualization Encapsulations

In this post, Bruce Davie and T. Sridhar of VMware’s Networking and Security Business Unit take a look at a proposed a new encapsulation protocol that would standardize how traffic is tunneled over the physical infrastructure by network overlay software.

++++

For as long as we’ve been doing Network Virtualization, there has been debate about how best to encapsulate the data. As we pointed out in an earlier post, it’s entirely reasonable for multiple encapsulations (e.g. VXLAN and STT) to co-exist in a single network. With the recent publication of “Geneve”, a new proposed encapsulation co-authored by VMware, Microsoft, Red Hat and Intel, we thought it would be helpful to clarify a few points regarding encapsulation for network virtualization. First, with all the investment made by us and our partners in developing support for VXLAN (described here), we very much intend to continue supporting VXLAN — indeed, we’ll be enhancing our VXLAN capabilities. Second, we want to explain why we believe Geneve is a necessary and useful addition to the network virtualization landscape.

Read the rest of Bruce’s blog on the Office of the CTO blog here.

Guest Post – When Pets Meet Cattle: OpenStack and VMware, Part 1

VMware is the industry’s leading enterprise virtualization software company. So it’s not OpenStack Logosurprising that one of the most common questions asked by enterprises considering OpenStack is: “How does OpenStack integrate with VMware vSphere and VMware NSX?” In November 2013, Mirantis and VMware set forth plans to work together on integrating Mirantis OpenStack with vSphere and NSX. Now, as a result of our collaboration, we have built what we believe to be the easiest way to configure OpenStack for a VMware environment. And we’ve scheduled a webcast to do some show and tell about how the technologies work together. You can register for the webinar here.

And in case you were wondering about the headline of this post, it was taken from a blog post written by our friend Nick Chase over at Mirantis.  Take a read of his post here.

Roger

VMware at RSA Conference 2014 (#RSAC)

Summary:logo_rsac

  • Company outlines vision for security in the Software-Defined Data Center
  • Product and partner demonstrations in Booth #1615 to showcase growing security portfolio
  • New PCI-DSS 3.0 and FedRAMP reference architectures to be presented

Throughout its history, RSA Conference has consistently attracted the world’s best and brightest in the security field, creating opportunities for attendees to learn about IT security’s most important issues through first-hand interactions with peers, luminaries and emerging and established companies. Continue reading

Elephant Flow Mitigation via Virtual-Physical Communication

Note: this post was developed jointly by Justin Pettit of VMware and Mark Pearson of HP, with additional content from VMware’s Martin Casado and Bruce Davie.

A recent Network Heresy post “Of Mice and Elephants” discussed the impact long-lived flows (elephants) have on their short-lived peers (mice).  A quick summary is that, in a datacenter, it is believed that the majority of flows are short-lived (mice), but the majority of packets are long-lived (elephants). Mice flows tend to be bursty and latency-sensitive, whereas elephant flows tend to transfer large amounts of data, with per-packet latency being of less concern.  These elephants can fill up network buffers, which can introduce latency for mice.

At the HP 2013 Discover Conference, HP and VMware demonstrated a technology preview of detecting and handling elephant flows in an overlay network. The demonstration featured the joint HP-VMware solution announced at VMworld 2013.  VMware NSX provided an overlay network using HP switches as the underlay along with the HP VAN SDN controller. Through controller federation interfaces, the overlay and the underlay co-operated to mitigate the effects of the elephant flows on the mice. The solution shows the power of integration between network virtualization and SDN solutions. Continue reading

VMware @ OpenStack Summit Hong Kong

Next week in Hong Kong, the VMware team will have a major presence at the OpenStack Summit, and we have an ever-growing presence on the agenda of speaking sessions and demos. As we did with the Portland show, he is a show planner with a schedule of all the VMware sessions. Here’s a snapshot of what you can expect (and experience) at the show. Continue reading

VMware Announces General Availability of VMware NSX Network Virtualization Platform

Today at VMworld® in Barcelona, we once again highlighted VMware NSX, the platform for network virtualization. More importantly, we announced general availability of VMware NSX. Interested customers should contact their VMware representative who can put them in touch directly with a VMware NSX specialist.

Originally announced at VMworld in San Francisco, VMware NSX represents another giant step for VMware customers as they look to bring the operational benefits of server virtualization to the network. To read more about the launch, see our full blog post from the August announcement.  We also encourage you to read what our broad set of ecosystem partners had to say.

If you are interested in a deeper dive on VMware NSX, here is a great overview video from VMworld San Francisco in August.

Additionally, make sure you take a look at VMware NSX labs available in Hands-On Labs online portal. You can learn more about these labs in our blog post here.

Roger Fortier

Network Virtualization: The Holy Grail of Workload Agility

This is a guest post from vCloud Service Provider Logicworks which originally appeared on the VMware vCloud Blog. You can read more from Logicworks on their blog, Gathering Clouds.

Everyone is familiar with virtualization. It’s become the IT standard for achieving greater levels of resource efficiency and functionality. While it’s just a tool, the vast majority of new builds utilize it in some way.

This holds true for managed service providers (MSPs) as well. The benefits of virtualization to an MSP are similar to what an enterprise would experience. Given the nature of their business, MSPs put a great emphasis on truly being agile to client requirements, both in terms of build times and modifications of client environments.

Virtualization is absolutely key to that, and has been since the inception of VMware. The ability to resize a component of a client’s infrastructure on-demand and on the fly is an absolute must nowadays.

However, when we talk about virtualization, we mean virtualization of the compute layer, which is what everyone speaks about relative to virtual machines (VMs). And while VMs are an amazing innovation, the really interesting stuff is happening at the storage and network virtualization layers.

Logicworks is very keen on network virtualization, as opposed to the traditional configuration of hardware switches, which is a major reason we joined the NSX Beta Program.

Historically, the challenge with network virtualization’s centered on the limitations of spanning the network virtualization layer from a client’s existing virtual environment to other environments, including other data centers.

One of the major benefits we’re looking to achieve with the NSX beta program is that the technology after the acquisition of Nicira makes it possible to span virtualization between data centers which helps realize the dream of true, still completely active mobile workloads.

One of the challenges that this resolves is lead times in deployments. As it is today, providers still need to log into various switches between different vendors to configure and test them. While this is somewhat automatable, it hasn’t achieved that same degree of automation, which compute virtualization enjoys. Network virtualization gives us the ability, using software and scripts and predetermined runbooks, to deploy clients via API calls to a control cluster instead of logging into physical devices.

In addition, providers also use various vendors’ networks offerings. This means that the set of commands one will have to run on a Juniper device is going to different than on an Extreme device, and complex configurations can be quite a bit different between the two.

If we abstract that away by making the basic configuration of either of those hardware devices as simple as possible, enough to enable network virtualization on top of it, then we can standardize our configurations across our clients. This process becomes more repeatable and much quicker to deploy –like the DevOps model applied to network virtualization, to a degree. If the work being done is as close to possible from one client to another, then we can remove potential errors and increase efficiencies through more automation.

Being on the cutting edge of not-yet-industry-standard technology enables Logicworks to deploy cross-production workloads, and serve as an agile service provider. This dovetails nicely with the next generation of network virtualization in that it mirrors our ability to respond quickly and dynamically to make adjustments in deployments.  For the first time, the capabilities of the technology match exactly what it is that we, as a hosting provider, do every day.

VMware NSX Labs Available in Hands-On Labs Online Portal

At VMworld 2013 in San Francisco, we launched the VMware NSX network virtualization platform to the world. During the keynote, our CEO Pat Gelsinger was joined by representatives from CITI, GE and eBay to discuss the promise of network virtualization and VMware NSX, and more than 20 partners announced support for the platform.

But perhaps the most successful part of our launch were the VMware NSX Hands-On Labs.  These labs were by far the most successful at the show. Attendees consumed more than 2,000 sessions, totaling 124,000 lab minutes during the four days of VMworld. That is roughly equivalent to locking yourself in a room with your laptop and doing nothing but take this lab 24 hours a day, seven days a week, for three months straight.

And now, we are bringing the labs to you online. Remember to participate in the HOL Community page at http://hol.vmware.com/, take labs at http://labs.hol.vmware.com/.

HOL-SDC-1303 – VMware NSX: The Network Virtualization Platform

A Tech Preview of the exciting new VMware NSX for vSphere product announced at VMworld. Learn how VMware NSX virtualizes your network and simplifies your datacenter operations. This lab is currently based on a beta version of code and you may encounter some user interface issues during the lab exercises. The lab will be improved with newer code as the product moves closer to release. For now, brave the rapids, jump in with both feet and have a go at at VMware NSX, the network virtualization platform.

Enroll in HOL-SDC-1303

HOL-SDC-1319 - VMware NSX for Multi-Hypervisor Environments

Also a Tech Preview, this lab focuses on the multi-hypervisor version of VMware NSX. This is a great opportunity to see how Vmware NSX can support non-vSphere portions of your datacenter.

Enroll in HOL-SDC-1319

Roger Fortier