Home > Blogs > VMware VROOM! Blog

Each vSphere release introduces new vMotion functionality, increased reliability and significant performance improvements. vSphere 5.5 continues this trend by offering new enhancements to vMotion to support EMC VPLEX Metro, which enables shared data access across metro distances.

In this blog, we evaluate vMotion performance on a VMware vSphere 5.5 virtual infrastructure that was stretched across two geographically dispersed datacenters using EMC VPLEX Metro.

Test Configuration

The VPLEX Metro test bed consisted of two identical VPLEX clusters, each with the following hardware configuration:

• Dell R610 host, 8 cores, 48GB memory, Broadcom BCM5709 1GbE NIC
• A single engine (two directors) VPLEX Metro IP appliance
• FC storage switch
• VNX array, FC connectivity, VMFS 5 volume on a 15-disk RAID-5 LUN


Figure 1. Logical layout of the VPLEX Metro deployment

Figure 1 illustrates the deployment of the VPLEX Metro system used for vMotion testing. The figure shows two data centers, each with a vSphere host connected to a VPLEX Metro appliance. The VPLEX virtual volumes presented to the vSphere hosts in each data center are synchronous, distributed volumes that mirror data between the two VPLEX clusters using write-through caching. As a result, vMotion views the underlying storage as shared storage, or exactly equivalent to a SAN that both source and destination hosts have access to. Hence, vMotion in a Metro VPLEX environment is as easy as traditional vMotion that live migrates only the memory and device state of a virtual machine.

The two VPLEX Metro appliances in our test configuration used IP-based connectivity. The vMotion network between the two ESXi hosts used a physical network link distinct from the VPLEX network. The Round Trip Time (RTT) latency on both VPLEX and vMotion networks was 10 milliseconds.

Measuring vMotion Performance

The following metrics were used to understand the performance implications of vMotion:

• Migration Time: Total time taken for migration to complete
• Switch-over Time: Time during which the VM is quiesced to enable switchover from source to the destination host
• Guest Penalty: Performance impact on the applications running inside the VM during and after the migration

Test Results


Figure 2. VPLEX Metro vMotion performance in vSphere 5.1 and vSphere 5.5

Figure 2 compares VPLEX Metro vMotion performance results in vSphere 5.1 and vSphere 5.5 environments. The test scenario used an idle VM configured with 2 VCPUs and 2GB memory. The figure shows a minor difference in the total migration time between the two vSphere environments and a significant improvement in vMotion switch-over time in the vSphere 5.5 environment. The switch-over time reduced from about 1.1 seconds to about 0.6 second (a nearly 2x improvement), thanks to a number of performance enhancements that are included in the vSphere 5.5 release.

We also investigated the impact of VPLEX Metro live migration on Microsoft SQL Server online transaction processing (OLTP) performance using the open-source DVD Store workload. The test scenario used a Windows Server 2008 VM configured with 4 VCPUs, 8GB memory, and a SQL Server database size of 50GB.


Figure 3. VPLEX Metro vMotion impact on SQL Server Performance

Figure 3 plots the performance of a SQL Server virtual machine in orders processed per second at a given time—before, during, and after VPLEX Metro vMotion. As shown in the figure, the impact on SQL Server throughput was very minimal during vMotion. The SQL Server throughput on the destination host was around 310 orders per second, compared to the throughput of 350 orders per second on the source host. This throughput drop after vMotion is due to the VPLEX inter-cluster cache coherency interactions and is expected. For some time after the vMotion, the destination VPLEX cluster continued to send cache page queries to the source VPLEX cluster and this has some impact on performance. After all the metadata is fully migrated to the destination cluster, we observed the SQL Server throughput increase to 350 orders per second, the same level of throughput seen prior to vMotion.

These performance test results show the following:

  • Remarkable improvements in vSphere 5.5 towards reducing vMotion switch-over time during metro migrations (for example, a nearly 2x improvement over vSphere 5.1)
  • VMware vMotion in vSphere 5.5 paired with EMC VPLEX Metro can provide workload federation over a metro distance by enabling administrators to dynamically distribute and balance the workloads seamlessly across data centers

To find out more about the test configuration, performance results, and best practices to follow, see our recently published performance study.

SEsparse Shows Significant Improvements over VMFSsparse

Limited amounts of physical resources can make large-scale virtual infrastructure deployments challenging. Provisioning dedicated storage space to hundreds of virtual machines can become particularly expensive. To address this VMware vSphere 5.5 provides two sparse storage techniques, namely VMFSparse and SEsparse. Running multiple VMs using sparse delta-disks with a common parent virtual disk brings down the required amount of physical storage making large-scale deployments manageable. SEsparse was introduced in VMware vSphere 5.1 and in vSphere 5.5 became the default virtual disk snapshotting technique for VMDKs greater than 2 TB. Various enhancements were made to SEsparse technology in the vSphere 5.5 release, which makes SEsparse perform mostly on par or better than VMFSsparse formats. In addition dynamic space reclamation confers on SEsparse a significant advantage over VMFSsparse virtual disk formats. This feature makes SEsparse the choice for VMware® Horizon View™ environments where space reclamation is critical due to the large number of tenants sharing the underlying storage.


A recently published paper reports the results from a series of performance studies of SEsparse and VMFsparse using thin virtual disks as baselines. The performance was evaluated using a comprehensive set of Iometer workloads along with workloads from two real world application domains: Big Data Analytics and Virtual Desktop Infrastructure (VDI). Overall, the performance of SEsparse is significantly better than the VMFSsparse format for random write workloads and mostly on par or better for the other analyzed workloads, depending on type.

Read the full performance study, “SEsparse in VMware vSphere 5.5.”

VDI Benchmarking Using View Planner on VMware Virtual SAN (VSAN)

VMware vSphere® 5.5 introduces the beta availability of VMware® Virtual SAN (VSAN). This feature allows a new software-defined storage tier, pools compute and direct-attached storage resources, and clusters server disks and flash to create resilient shared storage.

This blog showcases Virtual Desktop Infrastructure (VDI) performance on Virtual SAN using VMware View Planner, which is designed to simulate a large-scale deployment of virtualized desktop systems. This is achieved by generating a workload representative of many user-initiated operations that take place in a typical VDI environment. The results allow us to study the effects on an entire virtualized infrastructure including the storage subsystem. View Planner can be downloaded here.

In this blog, we evaluate the performance of VSAN using View Planner with different VSAN node configurations. In this experiment, we build a 3-node VSAN cluster and a 7-node VSAN cluster to determine the maximum number of VDI virtual machines (VMs) we can run while meeting the quality of service (QoS) criteria set for View Planner.  The maximum number of passing VMs is called the VDImark™ for a given system under test. This metric is used for VDI benchmarking and it encapsulates the number of VDI users that can be run on a given system with an application response time less than the set threshold. For response time characterization, View Planner operations are divided into three main groups: (1) Group A for interactive operations, (2) Group B for I/O operations, and (3) Group C for background operations. The score is determined separately for Group A user operations and Group B user operations by calculating the 95th percentile latency of all the operations in a group. The default thresholds are 1.0 second for Group A and 6.0 seconds for Group B. Please refer to the user guide, and the run and reporting guides for more details. Hence, the scoring is based on several factors such as the response time of the operations, compliance of the setup and configurations, and so on.

Experimental Setup

The host running the desktop VMs has 16 Intel Xeon E5-2690 cores running @ 2.9GHz. The host has 256GB physical RAM, which is more than sufficient to run 100 1GB Windows 7 VMs. For VSAN, each host has two disk groups where each disk group has one PCI-e solid-state drive (SSD) of 200GB and six 300GB 15k RPM SAS disks.

View Planner QoS (VDImark)

The View Planner benchmark was run to find the VDImark for both 3-node and 7-node VSAN clusters and the results are shown in the chart above. In the 3-node cluster, a VDImark of 286 was achieved and for 7-node cluster, a VDImark score of 677 was achieved. So, there is nice scaling in terms of maximum VMs supported as the numbers of nodes were increased in VSAN from 3 to 7.

To further illustrate the Group A and Group B response times, we show the average response time of individual operations for these runs for both Group A and Group B, as follows.

Group A Response Times

As seen in the figure above, the average response times of the most interactive operations are less than one second, which is needed to provide good end-user experience. If we look at the 3-node and 7-node run, we don’t see much variance in the response times, and they almost remain constant when scaling up. This clearly illustrates that, as we scale the number of VMs in larger nodes of a VSAN cluster, the user experience doesn’t degrade and scales nicely.

Group B Response Times

Since Group B is more sensitive to I/O and CPU usage, the above chart for Group B operations is more important to see how we scale. It is evident from the chart that there is not much difference in the response times as the number of VMs were increased from 286 VMs on a 3-node cluster to 677 VMs on a 7-node cluster. Hence, storage-sensitive VDI operations also scale well as we scale the VSAN nodes from 3 to 7.

To see other parts on the VDI/VSAN benchmarking blog series, check the links below:
VDI Benchmarking Using View Planner on VMware Virtual SAN – Part 1
VDI Benchmarking Using View Planner on VMware Virtual SAN – Part 2
VDI Benchmarking Using View Planner on VMware Virtual SAN – Part 3

vSphere Flash Read Cache Performance on vSphere 5.5

vSphere Flash Read Cache (vFRC) is a new solution to enhance storage I/O performance in vSphere 5.5.  vFRC lets you use flash devices (SSD or PCIe cards) as a read cache for VM I/Os and therefore improve performance. vFRC can improve performance for read-intensive workloads that have a high percentage of data locality. Such workloads are generated by database warehousing applications and enterprise server applications such as Web proxy servers and monitoring servers.

A recent technical white paper studies the performance of vFRC with respect to the following workloads:

  • A decision support system (DSS) database workload running with Oracle 11g R2
  • A DVD store workload running with Microsoft SQL Server 2008
  • Enterprise server-level I/O traces that are used extensively in storage research

The results are presented in the paper, along with performance best practices when using vFRC. The paper also gives an overview of the vFRC architecture. To learn more, read Performance of vSphere Flash Read Cache in VMware vSphere 5.5.

VMware vFabric Postgres 9.2 Performance and Best Practices

VMware vFabric Postgres (vPostgres) 9.2 improves vertical scalability over the previous version by 300% for pgbench SELECT-only (a common read-only OLTP workload) and by 100% for pgbench (a common read/write OLTP workload). vPostgres 9.2 on vSphere 5.1 achieves equal-to-native vertical scalability on a 32-core machine.

Using out-of-the-box settings for both vPostgres and vSphere, virtual machine (VM)-based database consolidation performs on par with alternative approaches (such as consolidated on one vPostgres server instance or consolidated on multiple vPostgres server instances but one operating system instance) in a baseline memory-undercommitted situation for a standard OLTP workload (using dbt2 benchmark, an open-source fair implementation of TPC-C); while performs increasingly more robust as memory overcommitment escalates (200% better than alternatives under a 55% memory-overcommitted situation).

By using an unconventionally larger database shared buffers (75% of memory size rather than the conventional 25%), vPostgres can attain both better performance (12% better) and more consistent performance (70% less temporal variation).

When using an unconventionally larger database shared buffers, the vPostgres database memory ballooning technique can enhance the robustness of VM-based database consolidation: under a 55% memory-overcommitted situation, using its help can advance the performance advantage of VM-based consolidation over alternatives from 60% to 140%.

For more details including experimentation methodology and references, please read the namesake whitepaper.

Performance Best Practices for vSphere 5.5 is Available

We are pleased to announce the availability of Performance Best Practices for vSphere 5.5. This is a book designed to help system administrators obtain the best performance from vSphere 5.5 deployments.

The book addresses many of the new features in vSphere 5.5 from a performance perspective. These include:

  • vSphere Flash Read Cache, a new feature in vSphere 5.5 allowing flash storage resources on the ESXi host to be used for read caching of virtual machine I/O requests.
  • VMware Virtual SAN (VSAN), a new feature (in beta for vSphere 5.5) allowing storage resources attached directly to ESXi hosts to be used for distributed storage and accessed by multiple ESXi hosts.
  • The VMware vFabric Postgres database (vPostgres).

We’ve also updated and expanded on many of the topics in the book. These include:

  • Running storage latency and network latency sensitive applications
  • NUMA and Virtual NUMA (vNUMA)
  • Memory overcommit techniques
  • Large memory pages
  • Receive-side scaling (RSS), both in guests and on 10 Gigabit Ethernet cards
  • VMware vMotion, Storage vMotion, and Cross-host Storage vMotion
  • VMware Distributed Resource Scheduler (DRS) and Distributed Power Management (DPM)
  • VMware Single Sign-On Server

The book can be found here.

Line-Rate Performance with 80GbE and vSphere 5.5

With the increasing number of physical cores in a system, the networking bandwidth requirement per server has also increased. We often find many networking-intensive applications are now being placed on a single server, which results in a single vSphere server requiring more than one 10 Gigabit Ethernet (GbE) adapter. Additional network interface cards (NICs) are also deployed to separate management traffic and the actual virtual machine traffic. It is important for these servers to service the connected NICs well and to drive line rate on all the physical adapters simultaneously.

vSphere 5.5 supports eight 10GbE NICs on a single host, and we demonstrate that a host running with vSphere 5.5 can not only drive line rate on all the physical NICs connected to the system, but can do it with a modest increase in overall CPU cost as we add more NICs.

We configured a single host with four dual-port Intel 10GbE adapters for the experiment and connected them back-to-back with an IXIA Application Network Processor Server with eight 10GbE ports to generate traffic. We then measured the send/receive throughput and the corresponding CPU usage of the vSphere host as we increased the number of NICs under test on the system.

Environment Configuration

  • System Under Test: Dell PowerEdge R820
  • CPUs: 4 x  Intel Xeon Processors E5-4650 @ 2.70GHz
  • Memory: 128GB
  • NICs:8 x Intel 82599EB 10GbE, SFP+ Network Connection
  • Client: Ixia Xcellon-Ultra XT80-V2, 2U Application Network Processor Server

Challenges in Getting 80Gbps Throughput

To drive near 80 gigabits of data per second from a single vSphere host, we used a server that has not only the required CPU and memory resources, but also the PCI bandwidth that can perform the necessary I/O operations. We used a Dell PowerEdge Server with an Intel E5-4650 processor because it belongs to the first generation of Intel processors that supports PCI Gen 3.0. PCI Gen 3.0 doubles the PCI bandwidth capabilities compared to PCI Gen 2.0. Each dual-port Intel 10GbE adapter needs at least a PCI Gen 2.0 x8 to reach line rate. Also, the processor has Intel Data Direct I/O Technology where the packets are placed directly in the processor cache rather than going to the memory. This reduces the memory bandwidth consumption and also helps reduce latency.

Experiment Overview

Each 10GbE port of the vSphere 5.5 server was configured with a separate vSwitch, and each vSwitch had two Red Hat 6.0 Linux virtual machines running an instance of Apache web server. The web server virtual machines were configured with 1 vCPU and 2GB of memory with VMXNET3 as the virtual NIC adapter.  The 10GbE ports were then connected to the Ixia Application Server port. Since the server had two x16 slots and five x8 slots, we used the x8 slots for the four 10GbE NICs so that each physical NIC had identical resources. For each physical connection, we then configured 200 web/HTTP connections, 100 for each web server, on an Ixia server that requested or posted the file. We used a high number of connections so that we had enough networking traffic to keep the physical NIC at 100% utilization.

Figure 1. System design of NICs, switches, and VMs

The Ixia Xcellon application server used an HTTP GET request to generate a send workload for the vSphere host. Each connection requested a 1MB file from the HTTP web server.

Figure 2 shows that we could consistently get the available[1] line rate for each physical NIC as we added more NICs to the test. Each physical NIC was transmitting 120K packets per second and the average TSO packet size was close to 10K. The NIC was also receiving 400K packets per second for acknowledgements on the receive side. The total number of packets processed per second was close to 500K for each physical connection.

Figure 2. vSphere 5.5 drives throughput at available line rates. TSO on the NIC resulted in lower packets per second for send.

Similar to the send case, we configured the application server to post a 1MB file using an HTTP POST request for generating receive traffic for the vSphere host. We used the same number of connections and observed similar behavior for the throughput. Since the NIC does not have support for hardware LRO, we were getting 800K packets per second for each NIC. With eight 10GbE NICs, the packet rate reached close to 6.4 million packets per second. VMware does Software LRO for Linux and as a result we see large packets in the guest. The guest packet rate is around 240K packets per second. There was also significant traffic for TCP acknowledgements and for each physical NIC. The host was transmitting close to 120K acknowledgement packets for each physical NIC, bringing the total packets processed close to 7.5 million packets per second for eight 10Gb ports.

Figure 3. Average vSphere 5.5 host CPU utilization for send and receive

We also measured the average CPU reported for each of the tests. Figure 3 shows that the vSphere host’s CPU usage increased linearly as we added more physical NICs to the test for both send and receive. This indicates that performance improves at an expected and acceptable rate.

Test results show that vSphere 5.5 is an excellent platform on which to deploy networking-intensive workloads. vSphere 5.5 makes use of all the physical bandwidth capacity available and does this without incurring additional CPU cost.

 


[1]A 10GbE NIC can achieve only 9.4 Gbps of throughput with standard MTU. For a 1500 byte packet, we have 40 bytes for the TCP /IP header and 38 bytes for the Ethernet frame format.

Deploying Extremely Latency-Sensitive Applications in VMware vSphere 5.5

VMware vSphere ensures that virtualization overhead is minimized so that it is not noticeable for a wide range of applications including most business critical applications such as database systems, Web applications, and messaging systems. vSphere also supports well applications with millisecond-level latency constraints, including VoIP services. However, performance demands of latency-sensitive applications with very low latency requirements such as distributed in-memory data management, stock trading, and high-performance computing have long been thought to be incompatible with virtualization.

vSphere 5.5 includes a new feature for setting latency sensitivity in order to support virtual machines with strict latency requirements. This per-VM feature allows virtual machines to exclusively own physical cores, thus avoiding overhead related to CPU scheduling and contention. A recent performance study shows that using this feature combined with pass-through mechanisms such as SR-IOV and DirectPath I/O helps to achieve near-native performance in terms of both response time and jitter.

The paper explains major sources of latency increase due to virtualization in vSphere and presents details of how the latency-sensitivity feature improves performance along with evaluation results of the feature. It also presents some best practices that were concluded from the performance evaluation.

For more information, please read the full paper: Deploying Extremely Latency-Sensitive Applications in VMware vSphere 5.5.

 

Simulating different VDI users with View Planner 3.0

VDI benchmarking is hard. What makes it challenging is getting a good representation or simulation of VDI users.  If we closely look at typical office users, we can get a spectrum of VDI users where at the one end of spectrum, the user may be using some simple Microsoft Office applications at a relatively moderate speed, whereas at the other end of spectrum, the user may be running some CPU-heavy multimedia applications and switching between many applications much faster. We classify the fast user as the power user or the “heavy” user, whereas we classify the user at the other end of the spectrum as the task worker or as the “light” user. In between the two categories, we define one more category which lies in between these two ends of the spectrum, which is the “medium” user.

To simulate these different categories of users and to make the job of VDI benchmarking much easier, we have made VMware View Planner 3.0, which simulates a workload representative of many user-initiated operations that take place in a typical VDI environment. The tool simulates typical Office user applications such as PowerPoint, Outlook, and Word; and Adobe Reader, Internet Explorer Web browser, multimedia applications, and so on. The tool can be downloaded from: http://www.vmware.com/products/desktop_virtualization/view-planner/overview.html.

If we look at the three categories of VDI users outlined above, one of the main differentiating factors across this gamut of VDI users is how fast they act and this is simulated using the concept of “think time” in the View Planner tool. The tool uses the thinktime parameter to randomly sleep before starting the next application operation. For the heavy user, the value of thinktime is kept very low at 2 seconds. This means that operations are happening very fast and users are switching across different applications or doing operations in an application every 2 seconds on average. The View Planner 3.0 benchmark defines a score, called “VDImark” which is based on this “heavy” user workload profile. For a medium user, the think time is set to 5 seconds, and for a light user, the think time is set to 10 seconds. The heavy VDI user also uses a bigger screen resolution compared to the medium or light user. The simulation of these category of users in the View Planner tool is summarized in the table below:

In order to show the capability of View Planner 3.0 to determine the sizing for VDI user VMs per host, we ran a flexible mode of View Planner 3.0, which allowed us to create medium and light user workloads (the heavy workload profile pre-exists), as well to understand the user density for different types of VDI users for a given system. The flexible mode will be available soon through Professional Services Organization (PSO) and to selected partners.

The experimental setup we used to compare these different user profiles is shown below:

In this test, we want to determine how many VMs can be run on the system while each VM is performing its heavy, medium, or light user profiles. In order to do this, we need to set a baseline of acceptable performance, which is defined by the quality of service (QoS) as defined in the View Planner user guide. The number of VMs that passed the QoS score is shown in the chart below.

The chart shows that we can run about 53 VMs for the heavy user (VDImark), 67 VMs for the medium user, and 91 VMs for light users. So, we could consolidate about 25% more desktops if we used this system to host users with medium workloads instead of heavy workloads. And we could consolidate 35% more desktops if we used this system to host users with light workloads instead of medium workloads. So, it is crucial to fully specify the user profile whenever we talk about the user density.

In this blog, we demonstrated how we used the View Planner 3.0 flexible mode to run different user profiles and to understand the user density for a system under test. If you have any questions and want to know more about View Planner, you can reach out to the team at viewplanner-info@vmware.com

IPv6 performance improvements in vSphere 5.5

Many of our customers use IPv6 networks in their datacenters for a variety of reasons. We expect that many more will transition from IPv4 to IPv6 to reap the large address range and other benefits that IPv6 provides. Keeping this in mind, we have worked on a number of performance enhancements for the way that vSphere 5.5 manages IPv6 network traffic. Some new features that we have implemented include:

• TCP Checksum Offload: For Network Interface Cards (NICs) that support this feature, the computation of the TCP checksum of the IPv6 packet is offloaded to the NIC.

• Software Large Receive Offload (LRO): LRO is a technique of aggregating multiple incoming packets from a single stream into a larger buffer before they are passed higher up the networking stack, thus reducing the number of packets that have to be processed and saving CPU. Many NICs do not support LRO for IPv6 packets in hardware. For such NICs, we implement LRO in the vSphere network stack.

• Zero-Copy Receive: This feature prevents an unnecessary copy from the packet frame to a memory space in the vSphere network stack. Instead, the frame is processed directly.

vSphere 5.1 offers the same features, but only for IPv4. So, in vSphere 5.1, services such as vMotion, NFS, and Fault Tolerance had lower bandwidth in IPv6 networks when compared to IPv4 networks. vSphere 5.5 solves that problem—it delivers similar performance over both IPv4 and IPv6 networks. A seamless transition from IPv4 to IPv6 is now possible.

Next, we demonstrate the performance of vMotion over a 40Gb/s network connecting two vSphere hosts. We also demonstrate the performance of networking traffic between two virtual machines created on the vSphere hosts.

System Configuration
We set up a test environment with the following specifications:

• Servers: 2 Dell PowerEdge R720 servers running vSphere 5.5.
• CPU: 2-socket, 12-core Intel Xeon E5-2667 @ 2.90 GHz.
• Memory: 64GB memory; 32GB spread across two NUMA nodes.
• Networking: 1 dual-port Intel 10GbE and 1 dual-port Broadcom 10GigE adapter placed on separate PCI Gen-2 x8 lanes in both machines. We thus had 40Gb/s of network connectivity between the two vSphere hosts.
• Virtual Machine for vMotion: 1 VM running Red Hat Enterprise Linux Server 6.3 assigned 2 virtual CPUs (vCPUs) and 48GB memory. We migrate this VM between the two vSphere hosts.
• Virtual Machines for networking tests: A pair of VMs running Red Hat Enterprise Linux server 6.3, assigned 4 vCPUs and 16GB memory, on each host. We use these VMs to test the performance of networking traffic between two VMs.

We configured each vSphere host with four vSwitches, each vSwitch having one 10GbE uplink port. We created one VMkernel adapter on each vSwitch. Each VMkernel adapter was configured on the same subnet. The MTU of the NICs was set to the default of 1500 bytes. We enabled each VMkernel adapter for vMotion, which allowed vMotion traffic to use the 40Gb/s network connectivity. We created four VMXNET3 virtual adapters on the pair of virtual machines used for networking tests.

Methodology
In order to demonstrate the performance for vMotion, we simulated a heavy memory usage footprint in the virtual machine. The memory-intensive program allocated 48GB memory in the virtual machine and touched one byte in each page in an infinite loop. We migrated this virtual machine between the two vSphere hosts over the 40Gb/s network. We used net-stats to monitor network throughput and CPU utilization on the sending and receiving systems. We also noted the bandwidth achieved in each pre-copy iteration of vMotion from VMkernel logs.

In order to demonstrate the performance of virtual machine networking traffic, we use Netperf 2.60 to simulate traffic from one virtual machine to the other. We create two connections for each virtual adapter. Each connection generates traffic for the TCP_STREAM workload, with 16KB message size and 256KB socket buffer size. As in the previous experiment, we used net-stats to monitor network throughput and CPU utilization.

Results
Figures 1 and 2 show, for IPv4 and IPv6 traffic, the network throughput and CPU utilization data that we collected over the 40-second duration of the migration. After the guest memory is staged for migration, vMotion begins iterations of pre-copying the memory contents from the source vSphere host to the destination vSphere host.

In the first iteration, the destination vSphere host needs to allocate pages for the virtual machine. Network throughput is below the available bandwidth in this stage as vMotion bandwidth usage is throttled by the memory allocation on the destination host. The average network bandwidth during this phase was 1897 megabytes per second (MB/s) for IPv4 and 1866MB/s for IPv6.

After the first iteration, the source vSphere host sends the delta of changed pages. During this phase, the average network bandwidth was 4301MB/s with IPv4 and 4091MB/s with IPv6.

The peak measured bandwidth in netstats was 34.5Gb/s for IPv4 and 32.9Gb/s for IPv6. The CPU utilization of both systems followed a similar trend for both IPv4 and IPv6. Please also note that vMotion is very CPU intensive on the receiving vSphere hosts, and high CPU clock speed is necessary to achieve high bandwidths. The results are summarized in Table 1. In all, migration of the virtual machine was complete in 40 seconds regardless of IPv4 or IPv6 connectivity.

vMotion over an IPv4 network
Figure 1. vMotion over an IPv4 network
vMotion over an IPv4 network
Figure 2. VMotion over an IPv6 network

vMotion-IPv4 vs IPv6
Table 1. vMotion results—IPv4 versus IPv6

The results for virtual machine networking traffic are in Table 2. While the throughput with IPv6 is about 2.5% lower, the CPU utilization is the same on both the sending as well as the receive sides.

Virtual Machine Performance - IPv4 vs IPv6
Table 2. Virtual machine networking results—IPv4 versus IPv6

Thanks to a number of IPv6 enhancements added to vSphere 5.5, migrations with vMotion occur over IPv6 networks at speeds within 5%, compared to those over IPv4 networks. For virtual machine networking performance, the throughput of IPv6 is within 2.5% of IPv4. In addition, testing shows that we can drive bandwidth close to 40Gb/s link speeds with both protocols. Combined, this functionality allows for a seamless transition from IPv4 to IPv6 with little performance impact.