With Memorial Day weekend coming up, for me, it’s all about hot dogs, hamburgers, and fast car racing. I am huge Formula 1 fanatic, but Memorial Day is a bonanza of racing from the F1 Monaco Grand Prix, to NASCAR’s Coke 600, and of course the Indianapolis 500 all on the same day! The raw speed and performance of these races remind me of a 2016 VMworld presentation (NET8030) on NSX performance.

The argument still comes up now and again that “hardware is faster than software.” Network guys like me just assume that’s true. So, it came as a surprise to me when I watched the session which turned that assumption on its head. In this session, the presenter demonstrated that software is faster than hardware, way faster. Of course, I was dubious at first but quickly learned that physical networking and virtual networking is like the difference between the pace car and the race car. I always assumed the physical switch was the race car, but in the throughput presentation, Samuel showed two VM’s running on the same host with NSX routing, switching, and firewalling between them could get up to 106G! This information surprised me. Sort of like the same experience I had when I saw my first Daytona race and 40 cars took the green flag at 190 mph. The presentation went further to demonstrate throughput between two different hosts connected by a physical switch was 77G (w/2 40G uplinks). What caught my attention is that the physical switch was now the bottleneck compared to the NSX virtual switch. Like the restrictor plate NASCAR mandates in the Daytona cars to slow them down, I learned software could be faster than hardware. So, what this means is by running NSX a customer can not only get line rate performance but possibly improve performance as compared to their physical switch uplinks which today are usually multiples of 10G. You may have seen VMwares’ vRealize Network Insight product show routed traffic as a percentage of the total flows.  What this means is the physical switch is performing that routing function at host uplink speeds (~10G). By deploying NSX, that same traffic could run up to 100G or a 10x improvement.

Full disclosure, this is a nuanced conversation and most application communication is across hosts, but it is fast nonetheless. There are some optimizations that pNICs and vNICs leverage to deliver the performance numbers I mentioned above. TSO (TCP segmentation offload), LRO (large receive offload), and RSS (receive side scaling) are all optimizations required to deliver on those numbers. However, these optimizations are common in modern operating systems and newer pNICs. The interesting difference between physical and virtual is how packets are segmented. You may be familiar with MTU (maximum transmission unit). MTU is the size of the largest network layer PDU (protocol data unit) transmitted in a single network transaction. You may have heard presenters of NSX say we need a minimum 1600 byte MTU (for VXLAN overhead). Ethernet defaults are 1500, but switches can go up to 9K. The larger MTU brings greater efficiency because each network packet carries more user data and the higher efficiency means an improvement in bulk protocol throughput. Within a hypervisor, MTU is irrelevant with some of these optimizations as all the communication between VM’s is done in memory and not fragmented. So, a more critical maximum is MSS (maximum segment size). Since we skip fragmenting packets down to MTU sized packets, we could work with segments as large as 65K thanks to TSO and LRO within the host.  Even across hosts, a NIC card that supports LRO for VXLAN helps push the throughput up tremendously. We play by a different set of rules in the hypervisor. Call it cheating, call it lower tire pressure, call it an illegal spoiler, call it what you want, but the NSX race car is very competitive in the game of speed.

Green, white, checker…

 

To learn more about NSX and performance, I recommend these two blogs: