Home > Blogs > VMware vSphere Blog > Monthly Archives: March 2009

Monthly Archives: March 2009

Down and Dirty Network Troubleshooting Using Traces

The first time I saw a network trace, I was totally captivated. I was a young network systems programmer working on an IBM mainframe. The network operator had a Spectron Datascope that he could patch into any of the 9.6kbps links connecting the front end processor with the remote offices. My fascination remained, and I progressed from that datascope to using GTF traces and eventually wrote my own multi-tasking real-time trace analysis package in IBM System/370 assembler.

While this was a labor of love stemming from my complete fascination with the subject, I found trace analysis was the most useful tool in my network troubleshooting bag of tricks. Traces did not lie—they showed exactly what was or wasn’t going on. Additionally, they gave me a more thorough understanding of network protocols.

Fast forward to today. We have more network tools, but networks have become a lot more complex and dispersed. Cisco and others have had port mirroring in many of their switches for a long time. SPAN or Switch Port Analyzer (as Cisco called the feature) enabled the network admin to selectively and non-disruptively replicate traffic from switch ports to another switch port connected to a protocol analyzer or a PC running wireshark or similar. The SPAN capability eventually evolved to Remote SPAN (RSPAN) and Encapsulated RSPAN (ERSPAN). The latter enabling routing of GRE encapsulated SPAN traffic to any point in the network (given sufficient bandwidth, of course!).

Tracing on a Virtual Switch

So what about virtual networks and virtual switches? How do you probe vswitch traffic? Fortunately, there is a simple and well-proven method for capturing traffic traversing a vswitch. The method involves setting up a guest VM (e.g. Windows, Linux) with Wireshark or other third party trace “sniffing” software. Simply:

  • Create a new port group with Promiscuous Mode=Accept in the Port Security options.
  • Set the VLAN to the VLAN ID you wish to trace, or set VLAN=4095 to trace traffic for all VLANs on that vswitch (assuming VST mode)

And there you have it. Start Wireshark in the VM and monitor through the Console.

Forthcoming Options …

With the Cisco Nexus 1000V with our forthcoming release, you will have another alternative. The Nexus 1000V supports SPAN and ERSPAN (see complete feature comparison here), so the network folks can use the same methods and techniques whether it be a virtual or physical network. The ERSPAN capability means you can redirect the trace traffic to any point without setting up a specialized sniffing VM on the host and vswitch in question.           

Data Center of the Future

Cisco has launched a new Data Center of the Future website that features an interactive concierge who engages in an assortment of antics. e.g putting on a tie, reading a CCNA book, doing puzzles, and so on. You’ll have to leave it running for awhile for the sound on to see and hear them all.

On a serious note, it does contain a wealth of information including a number of papers VMware and Cisco/VMware jointly authored papers. e.g. VI3 in a Cisco Network Environment.

Take a look at www.thedcofthefuture.com

A Virtual Switch Feature-fest

The announcement of the VMware vNetwork Distributed Switch and the Cisco Nexus 1000V at VMworld last September was pretty exciting as network things go. These new virtual switches put a bunch of new features and capabilities at the disposal of network and server admins in addition to simplifying deployment, configuration and monitoring.

But what do these switches actually do? What features do they provide? Do any of the switches provide ACLs? SPAN? Can you create Private VLANs? The list goes on, as do the questions from customers who are now trying to plan for these new switches in their environments.

So, to address the rising frenzy of questions, my opposite numbers at Cisco and I recently collaborated on a document detailing the features of these two switches and compared them against the current ESX 3.5 vSwitch and forthcoming Standard Switch.

Virtual Networking Features of the VMware vNetwork Distributed Switch and Cisco Nexus 1000V was posted on the VMware Networking Technology website tonight and will follow shortly on the cisco.com website.

Let me know what you think!

Deciphering NIC Capabilities from log entries

Time once again for a down and dirty, bits and bytes style guest post! One of our Consulting Architects, Kamau Wanguhu, put this post together with some guidance from engineering…  

You may have seen messages like those below in your logs and wondered what neat capabilities are being enabled on your network card (NIC).

ESXsrv01 vmkernel: 22:04:29:43.583 cpu1:1198)Uplink: 2491: Setting capabilities 0×0 for device vmnic4
ESXsrv01 vmkernel: 22:04:29:43.583 cpu1:1198)Uplink: 2491: Setting capabilities 0×0 for device vmnic5
ESXsrv01 vmkernel: 22:04:59:11.410 cpu0:1235)Uplink: 2491: Setting capabilities 0×300 for device vmnic2 ESXsrv01 vmkernel: 22:04:59:11.410 cpu0:1235)Uplink: 2491: Setting capabilities 0×300 for device vmnic3 ESXsrv01 vmkernel: 22:04:59:11.410 cpu0:1235)Uplink: 2491: Setting capabilities 0x2b for device vmnic2
ESXsrv01 vmkernel: 22:04:59:11.410 cpu0:1235)Uplink: 2491: Setting capabilities 0x2b for device vmnic3

The magic decoder ring …

The HEX codes we use to decode the “Setting Capabilities” for the above style log messages are as follows:

0×0001 Capable of scatter-gather transmits
0×0002 can checksum only TCP/UDP over IPv4
0×0008 Can use DMA to high memory locations
0×0020 Supports TCP Segmentation offload, TSO
0×0100 Can do VLAN tagging in HW
0×0200 Can do VLAN untagging in HW

Decoding …

So now off to the fun stuff of decoding our log file output….

The first log entry sets the capabilities to 0×0. This is logged when the last virtual port of a portgroup that the uplink is assigned to is disconnected. In other words, the hardware capability is no longer needed, as no traffic will be sent over these uplinks (vmnic4 and vmnic5).

The next capability that is set is 0×300 for vmnic2 and vmnic3. From the table above we can get 0×300 by adding 0×0200 and 0×0100. This means that we are enabling:

0×0100 VLAN tagging in hardware.
0×0200 VLAN untagging in hardware

The next capability that is set is 0x2b. From this we can tell we are enabling:

0×0001 scatter-gather transmits
0×0002 checksum only TCP/UDP over IPv4
0×0008 DMA to high memory
0×0020 TCP Segmentation offload, TSO

as 0×0001 + 0×0002 + 0×0008 + 0×0020 + = 0x002b

And that is all there is to translating the NIC “Setting Capabilities” log entries!

This often poses an additional question: “Can one use NICs from different manufacturers in a vSwitch considering the NICs will probably have different capabilities?”

In reality it should not matter what low level capabilities your NIC has or which ones are enabled and used by the VMKernel. Most capabilities that are not available on your particular NIC, the VMKernel will handle in software. As such you can have a vSwitch configured with two NICs, one capable of TSO and the other not. VMKernel will use the TSO capabilities of the NIC that has it, if a failover occurs to the NIC without TSO capabilities, the VMKernel will just handle segmentation in software before delivering data to the NIC.

Note that we are baring some of the innards of ESX 3.5, so be aware the codes are subject to change in future releases.