Multicast Performance on vSphere 5.0

Multicast is an efficient way of disseminating information and communicating over the network. A single sender can connect to multiple receivers and exchange information while conserving network bandwidth. Financial stock exchanges, multimedia content delivery networks, and commercial enterprises often use multicast as a communication mechanism. VMware virtualization takes multicast efficiency to the next level by enabling multiple receivers on a single ESXi host. Since the receivers are on the same host, the physical network does not have to transfer multiple copies of the same packet. Packet replication is carried out in the hypervisor instead.

In releases of vSphere prior to 5.0, the packet replication for multicast is done using a single context. When there is a high VM density per host, at high packet rates the replication context may become a bottleneck and cause packet loss. VMware added a new feature in ESXi 5.0 to split the cost of replication across various physical CPUs. This makes vSphere 5.0 a highly scalable and efficient platform for multicast receivers. This feature is called splitRxMode, and it can be enabled with a VMXNET3 virtual NIC. Fanning out processing to multiple contexts causes a slight increase in CPU consumption and is generally not needed for most systems. Hence, the feature is disabled by default. VMware recommends enabling splitRxMode in situations where multiple VMs share a single physical NIC and receive a lot of multicast/broadcast packets.

To enable splitRxMode for the Ethernet device:

Select the virtual machine you wish to change, then click Edit virtual machine settings.
Under the Options tab, select General, then click Configuration Parameters.
Look for ethernetX.emuRxMode. If it's not present, click Add Row and enter the new variable.
Click on the value to be changed and configure it as “1”. For default behavior the value is “0”.

Environment Configuration

Systems Under Test: 2 Dell PowerEdge R810
CPUs: 2 x 8 core Intel Xeon CPU E7- 8837 @ 2.67GHz (no hyperthreading)
Memory: 64GB
NICs: Broadcom NetXtreme II 57711 10Gb Ethernet

Experiment Overview

We tested splitRxMode by scaling the number of VMs on a single ESX host from 1 to 36 VMs with each VM receiving up to 40K packets per second. We had a consolidation ratio of 2 VMs per physical core when 32 VMs were powered on. The sender was a 2-vCPU RHEL VM on a separate physical machine transmitting 800-byte multicast packets at a fixed interval. The clients (receiving VMs) were 1-vCPU RHEL VMs running on the same ESXi host. Each receiver was using 10-15% of its CPU power for processing 10K packets per second, and the usage increased linearly as we increased packet rate. No noticeable difference in CPU usage was observed when splitRxmode was enabled. We then measured the total packets received by each client and calculated the average packet loss for the setup.

Experiment Results

The default ESX configuration could run up to 20 VMs, each receiving 40K packets per second with less than 0.01% packet loss. As we powered on more VMs, the networking context became the bottleneck and we started observing packet loss in all the receivers. The loss rate increased substantially as we powered on more VMs. A similar trend was observed for lower packet rates (30K packets per second).

We repeated the experiment after enabling splitRxMode on all VMs. As seen in the graph below, the new feature greatly increases the scalability of the vSphere platform in handling multicast packets. We were now able to power on 40% more VMs (28VMs) than before, each receiving 40K packets per second with less than 0.01% packet loss. At lower packet rates the performance improvement is even more noticeable, as we couldn’t induce packet loss with even 36 VMs powered on.