Author Archives: Lenin Singaravelu

Running latency-sensitive applications on vSphere

For those of us interested in running latency-sensitive applications on vSphere, Bhavesh Davda, from the CTO's office, has created a comprehensive guide for tuning vSphere for such applications. 

Some of the tuning options are very familiar to those working with low-latency applications, e.g., interrupt coalescing settings, and some of them are relatively obscure vSphere specific options. Using a combination of these options, we saw noticeable improvement in performance of some latency-bound benchmarks. As a bonus, the guide provides in-depth reasoning for some options.

You can find more details here and the complete whitepaper here.

Performance and Use Cases of VMware DirectPath I/O for Networking


VMware DirectPath I/O is a technology, available from vSphere 4.0 and higher that leverages hardware support (Intel VT-d and AMD-Vi) to allow guests to directly access hardware devices. In the case of networking, a VM with DirectPath I/O can directly access the physical NIC instead of using an emulated (vlance, e1000) or a para-virtualized (vmxnet, vmxnet3) device. While both para-virtualized devices and DirectPath I/O can sustain high throughput (beyond 10Gbps), DirectPath I/O can additionally save CPU cycles in workloads with very high packet count per second (say > 50k/sec). However, DirectPath I/O does not support many features such as physical NIC sharing, memory overcommit, vMotion and Network I/O Control. Hence, VMware recommends using DirectPath I/O only for workloads with very high packet rates, where CPU savings from DirectPath I/O may be needed to achieve desired performance.

DirectPath I/O for Networking

VMware vSphere 4.x provides three ways for guests to perform network I/O: device emulation, para-virtualization and DirectPath I/O. A virtual machine using DirectPath I/O directly interacts with the network device using its device drivers. The vSphere host (running ESX or ESXi) is only involved in virtualizing interrupts of the network device. In contrast, a virtual machine (VM) using an emulated or para-virtualized device (referred to as virtual NIC or virtualized mode henceforth) interacts with a virtual NIC that is completely controlled by the vSphere host. The vSphere host handles the physical NIC interrupts, processes packets, determines the recipient of the packet and copies them into the destination VM, if needed. The vSphere host also mediates packet transmissions over the physical NIC.

In terms of network throughput, a para-virtualized NIC such as vmxnet3 matches the performance of DirectPath I/O in most cases. This includes being able to transmit or receive 9+ Gbps of TCP traffic with a single virtual NIC connected to a 1-vCPU VM. However, DirectPath I/O has some advantages over virtual NICs such as lower CPU costs (as it bypasses execution of the vSphere network virtualization layer) and the ability to use hardware features that are not yet supported by vSphere, but might be supported by guest drivers (e.g., TCP Offload Engine or SSL offload). In the virtualized mode of operation, the vSphere host completely controls the virtual NIC and hence it can provide a host of useful features such as physical NIC sharing, vMotion and Network I/O Control. By bypassing this virtualization layer, DirectPath I/O trades off virtualization features for potentially lower networking-related CPU costs. Additionally, DirectPath I/O needs memory reservation to ensure that the VM’s memory has not been swapped out when the physical NIC tries to access the VM’s memory.

VMware’s Performance Review of DirectPath I/O vs. Emulation

VMware used the netperf [1] microbenchmark to plot the gains of DirectPath I/O as a function of packet rate. For the evaluation, VMware used the following setup:

  • SLES11-SP1 VM on vSphere 4.1. vSphere was running on a dual socket Intel E5520 processor (@2.27 GHz) with a Broadcom 57711 10GbE NIC as the physical NIC.
  • A native Linux machine was used as the traffic source or sink.
  • UDP_STREAM benchmark of netperf, along with the burst and interval functionality to send or receive packets at a controlled rate.

PktRate vs CPU Savings with DirectPath I/O

The above figure plots CPU savings due to DirectPath I/O as a percent of one core against packet rate (Packets per Second – PPS). Immediately, you can see the benefits of DirectPath I/O at high packet rates (100,000 PPS). However, it is equally clear that at lower packet rates, the benefits of DirectPath I/O are not as significant. At 10,000 PPS, DirectPath I/O can only save about 6% of one core. This is an important observation as many enterprise workloads do not have very high networking traffic (see Tables 1 and 2).

Table 1. Performance of enterprise class workloads with DirectPath I/O

To further illustrate the specific use cases and benefits for DirectPath I/O, VMware also compared its performance against that of a virtual NIC with three complex workloads: a web server workload and two database workloads. The web server workload and configuration was similar to SPECweb®2005 (described in reference [2]). We ran a fixed number of users requesting data from a web server and measured the CPU utilization between DirectPath I/O and a para-virtualized virtual NIC. Due to the high packet rate of this workload, DirectPath I/O is able to support 15% more users per %CPU Used. Note that in a typical web server workload, the packets that a web server receives are smaller than 1500 bytes (average of 86 bytes in our experiments). Hence, we cannot directly use the receive numbers in Figure 1 to calculate CPU savings.

Next, we looked at a database workload that has far lower packet rates. We used the Order Entry benchmark [3], and measured the ratio of number of operations per second. As expected, due to the low packet rate, the performance of virtual NIC and DirectPath I/O was similar.

We also looked at the performance of an OLTP-like workload with SAP and DB2 [4] on a 4-socket Intel X-7550 machine with one 8-vcpu VM. Virtual NIC out-performs DirectPath I/O by about 3% in the default configuration. This performance gap was an artifact of memory pinning, reservation and NUMA behavior of the platform in the DirectPath I/O configuration. By setting memory reservations for the virtual NIC configuration, we were able to match the performance of both configurations. Table 2 lists packet rates for some more enterprise-class workloads. Based on the packet rate numbers and the CPU cost saving estimates from Figure 1, we do not expect these workloads to benefit from the use of DirectPath I/O.

Table 2. Packet Rates for some  enterprise class workloads

Compatibility Matrix

DirectPath I/O requires the VM to be directly allowed to access a device and the device to be allowed to modify the VM’s memory (e.g., to copy a received packet to the VM’s memory). Additionally, the VM and the device can now share essential state information that is invisible to ESX. Hence the use of DirectPath I/O is incompatible with many of core virtualization features. Table 3 presents a compatibility matrix for DirectPath I/O.

Table 3. Feature Compatibility Matrix for DirectPath I/O


As stated in the beginning of this post, DirectPath I/O is intended for specific use cases. It is another technology VMware users can deploy to boost performance of applications with very high packet rate requirements.

Further Reading

  • VMware DirectPath I/O.
  • Configuration Examples and Troubleshooting for DirectPath I/O.


  1. netperf.
  2. Achieving High Web Throughput Scaling with VMware vSphere 4 on Intel Xeon 5500 series (Nehalem) servers.
  3. Virtualizing Performance-Critical Database Applications in VMware vSphere.
  4. SAP Performance on vSphere with IBM DB2 and SUSE Linux Enterprise.

SPECweb®2005 is a registered trademark of the Standard Performance Evaluation Corporation (SPEC).