Uncategorized

VMware pushes the envelope with vSphere 6.0 vMotion

vMotion in VMware vSphere 6.0 delivers breakthrough new capabilities that will offer customers a new level of flexibility and performance in moving virtual machines across their virtual infrastructures. Included with vSphere 6.0 vMotion are features – Long-distance migration, Cross-vCenter migration, Routed vMotion network – that enable seamless migrations across current management and distance boundaries. For the first time ever, VMs can be migrated across vCenter Servers separated by cross-continental distance with minimal performance impact. vMotion is fully integrated with all the latest vSphere 6 software-defined data center technologies including Virtual SAN (VSAN) and Virtual Volumes (VVOL). Additionally, the newly re-architected vMotion in vSphere 6.0 now enables extremely fast migrations at speeds exceeding 60 Gigabits per second.

In this blog, we present the latest vSphere 6.0 vMotion features as well as the performance results. We first evaluate vMotion performance across two geographically dispersed data centers connected by a 100ms round-trip time (RTT) latency network. Following that we demonstrate vMotion performance when migrating an extremely memory hungry “Monster” VM.

Long Distance vMotion

vSphere 6.0 introduces a Long-distance vMotion feature that increases the round-trip latency limit for vMotion networks from 10 milliseconds to 150 milliseconds. Long distance mobility offers a variety of compelling new use cases including whole data center upgrades, disaster avoidance, government mandated disaster preparedness testing, and large scale distributed resource management to name a few. Below, we examine vMotion performance under varying network latencies up to 100ms.

Test Configuration

We set up a vSphere 6.0 test environment with the following specifications:

Hardware

  • Two HP ProLiant DL580 G7 servers (32-core Intel Xeon E7-8837 @ 2.67 GHz, 256 GB memory)
  • Storage: Two EMC VNX 5500 arrays, FC connectivity, VMFS 5 volume on a 15-disk RAID-5 LUN
  • Networking: Intel 10GbE 82599 NICs
  • Latency Injector: Maxwell-10GbE appliance to inject latency in vMotion network

Software

  • VM config: 4 VCPUs, 8GB mem, 2 vmdks (30GB system disk, 20GB database disk)
  • Guest OS/Application: Windows Server 2012 / MS SQL Server 2012
  • Benchmark: DVDStore (DS2) using a database size of 12GB with 12,000,000 customers, 3 drivers without think-time

vsphere60-fig1-2-new

Figure 1 illustrates the logical deployment of the test-bed used for long distance vMotion testing. Long distance vMotion is supported with both no shared storage infrastructure and with shared storage solutions such as EMC VPLEX Geo which enables shared data access across long distances. Our test-bed didn’t use shared storage which resulted in migration of the entire state of VM including its memory, storage and CPU/device states. As shown in Figure 2, our test configuration deployed a Maxwell-10GbE network appliance to inject latency in vMotion network.

Measuring vMotion Performance

The following metrics were used to understand the performance implications of vMotion:

  • Migration Time: Total time taken for migration to complete
  • Switch-over Time: Time during which the VM is quiesced to enable switchover from source to the destination host
  • Guest Penalty: Performance impact on the applications running inside the VM during and after the migration

Test Results

We investigated the impact of long distance vMotion on Microsoft SQL Server online transaction processing (OLTP) performance using the open-source DVD Store workload. The test scenario used a Windows Server 2012 VM configured with 4 VCPUs, 8GB memory, and a SQL Server database size of 12GB. Figure 3 shows the migration time and VM switch-over time when migrating an active SQL Server VM at different network round-trip latencies. In all the test scenarios, we used a load of 3 DS2 users with no think time that generated substantial load on the VM. The migration was initiated during the steady-state period of the benchmark when the CPU utilization (esxtop %USED counter) of the VM was close to 120%, and the average read IOPS and average write IOPS were about 200 and 150, respectively.
ldvmotion-fig3 Figure 3 shows that the impact of round-trip latency was minimal on both duration of the migration and switch-over time, thanks to the latency aware optimizations in vSphere 6.0 vMotion. The difference in the migration time among the test scenarios was in the noise range (<5%). The switch-over time increased marginally from about 0.5 seconds in 5ms test scenario to 0.78 seconds in 100ms test scenario.

ldvmotion-fig4 Figure 4 plots the performance of a SQL Server virtual machine in orders processed per second at a given time—before, during, and after vMotion on a 100 ms round-trip latency network. In our tests, DVD store benchmark driver was configured to report the performance data at a fine granularity of 1 second (default: 10 seconds). As shown in the figure, the impact on SQL Server throughput was minimal during vMotion. The only noticeable dip in performance was during the switch-over phase (0.78 seconds) from the source to destination host. It took less than 5 seconds for the SQL server to resume to normal level of performance.

Faster migration

Why are we interested in extreme performance? Today’s datacenters feature modern servers with many processing cores (up to 80), terabytes of memory and high network bandwidth (10 and 40 GbE NICs). VMware supports larger “monster” virtual machines that can scale up to 128 virtual CPUs and 4TB of RAM. Utilizing higher network bandwidth to complete migrations of these monster VMs faster can enable you to implement high levels of mobility in private cloud deployments. The reduction in time to move a virtual machine can also reduce the overhead on the total network and CPU usage.

Test Config

  • Two Dell PowerEdge R920 servers (60-core Intel Xeon E7-4890 v2 @ 2.80GHz, 1TB memory)
  • Networking: Intel 10GbE 82599 NICs, Mellanox 40GbE MT27520 NIC
  • VM config: 12 VCPUs, 500GB mem
  • Guest OS: Red Hat Enterprise Linux Server 6.3

We configured each vSphere host with four Intel 10GbE ports and a single Mellanox 40 GbE port with total of 80Gb/s network connectivity between the two vSphere hosts. Each vSphere host was configured with five vSwitches, with four vSwitches having one unique 10GbE uplink port and fifth vSwitch with a 40GbE uplink port. The MTU of the NICs was set to the default of 1500 bytes. We created one VMkernel adapter on each of four vSwitches with 10GbE uplink port and four VMkernel adapters on the vSwitch with 40GbE uplink port. All the 8 VMkernel adapters were configured on the same subnet. We also enabled each VMkernel adapter for vMotion, which allowed vMotion traffic to use the 80Gb/s network connectivity.

Methodology

To demonstrate the extreme vMotion throughput performance, we simulated a very heavy memory usage footprint in the virtual machine. The memory-intensive program allocated 300GB memory inside the guest and touched a random byte in each memory page in an infinite loop. We migrated this virtual machine between the two vSphere hosts under different test scenarios: vMotion over 10Gb/s network, vMotion over 20Gb/s network, vMotion over 40Gb/s network and vMotion over 80Gb/s network. We used esxtop to monitor network throughput and CPU utilization on the source and destination hosts.

Test Results

ldvmotion-fig5

Figure 5 compares the peak network bandwidth observed in vSphere 5.5 and vSphere 6.0 under different network deployment scenarios. Let us first consider the vSphere 5.5 vMotion throughput performance. Figure 5 shows vSphere 5.5 vMotion reaches line rate in both 10Gb/s network and 20Gb/s network test scenarios. When we increased the available vMotion network bandwidth to beyond 20 Gb/s, the vMotion peak usage was limited to 18Gb/s in vSphere 5.5. This is because in vSphere 5.5 vMotion, each vMotion is assigned by default two helper threads which do the bulk of vMotion processing. Since the vMotion helper threads are CPU saturated, there is no performance gain when adding additional network bandwidth. When we increased the number of vMotion helper threads from 2 to 4 in the 40Gb/s test scenario, and thereby removed the CPU bottleneck, we saw the peak network bandwidth usage of vMotion in vSphere 5.5 increase to 32Gb/s. Tuning the helper threads beyond four hurt vMotion performance in 80Gb/s test scenario, as vSphere 5.5 vMotion has some locking issues which limit the performance gains when adding more helper threads. These locks are VM specific locks that protect VM’s memory.

The newly re-architected vMotion in vSphere 6.0 not only removes these lock contention issues but also obviates the need to apply any tunings. During the initial setup phase, vMotion dynamically creates the appropriate number of TCP/IP stream channels between the source and destination hosts based on the configured network ports and their bandwidth. It then instantiates a vMotion helper thread per stream channel thereby removing the necessity for any manual tuning. Figure 5 shows vMotion reaches line rate in 10Gb/s, 20Gb/s and 40Gb/s scenarios, while utilizing little over 64 Gb/s network throughput in 80Gb/s scenario. This is over a factor of 3.5x improvement in performance when compared to vSphere 5.5.

ldvmotion-fig5
Figure 6 shows the network throughput and CPU utilization data in vSphere 6.0 80Gb/s test scenario. During vMotion, memory of the VM is copied from the source host to the destination host in an iterative fashion. In the first iteration, the vMotion bandwidth usage is throttled by the memory allocation on the destination host. The peak vMotion network bandwidth usage is about 28Gb/s during this phase. Subsequent iterations copy only the memory pages that were modified during the previous iteration. The number of pages transferred in these iterations is determined by how actively the guest accesses and modifies the memory pages. The more modified pages there are, the longer it takes to transfer all pages to the destination server, but on the flip side, it enables vMotion’s advanced performance optimizations to kick-in to fully leverage the additional network and compute resources. That is evident in the third pre-copy iteration when the peak measured bandwidth was about 64Gb/s and the peak CPU utilization (esxtop ‘PCPU Util%’ counter) on destination host was about 40%.

Conclusions

The main results of this performance study are the following:

  • The dramatic 10x increase in round-trip time support offered in long-distance vMotion now makes it possible to migrate workloads non-disruptively over long distances such as New York to London
  • Remarkable performance enhancements in vSphere 6.0 towards improving Monster VM migration performance (up to 3.5x improvement over vSphere 5.5) in large scale private cloud deployments

 

 

 

Comments

19 comments have been added so far

  1. This is very interesting information. Thank you! I’m curious how long-distance vMotion works (if at all?) where the network subnet is not also stretched between locations? I see the testing was done with latency injection, but wondering if there are IP address implications that would limit the usage of this.

    Thanks,
    Mark B-

    1. Thank you, Mark. We do expect a stretched Layer-2 for VM network so VM can retain the same IP address after the migration and maintain the existing network connections.

    1. Thanks, Karim. Let me clarify the vMotion requirements. Prior to vSphere 6, ‘vMotion network’ required Layer 2 adjacency. Meaning that vMotion VMkernel
      NIC (vmknic) on both source and destination hosts needed to be on the same subnet to perform vMotion. In vSphere 6, vMotion network requires only Layer 3 access across the source and destination hosts. Meaning that vMotion traffic can now traverse over L3 network if you configure VMkernel gateway for vMotion network. We do expect a stretched Layer-2 for ‘VM network’ (network over which all the VM traffic traverses) so VM can retain the same IP address after the migration and maintain the existing network connections.

  2. Ok understand the VM network need to be stretched to maintain Vm connectivity after the vmotion, but this doesn’t really do much good and limited the usefulness of the Vmotion over L3 (If I can stretch my VM network, I probably would stretch my Vmotion network as well)??

    Is there a use case that all this works together perfectly? For e.g. with the use of NSX (Network Virtualization), then there is no need for stretched VM network (VM Vmotion over L3, VM network talk back to old site via NSX gateway etc.) Is this tested and anybody done that before?

    1. After vMotion the VM needs to able to communicate to all the services it was communicating prior to vMotion. Some of these services may have a L2 dependency. So, it is essential to have L2 stretching for VM network. But vMotion network does not have these constraints. Hence, with vSphere 6.0 the customers have a choice to use either L3 vMotion Network, or L2 stretched vMotion Network. You will need NSX like solution to stretch VM Network. Many of our customers preferred that L3 support for vMotion. Long distance vMotion is being implemented by our customers, although, at this time, we are not able to use them as references yet.

  3. Thanks for the post.

    What is the use case to vMotion a VM over 150ms RTT network when storage is still on Site A ? Also I would expect large performance hit if storage is now multiple times 10ms away from compute. Did I miss something ?

    What is the equivalent storage technology to go with long distance vMotion ? I understand VPLEX Metro max RTT is 10ms.

    thanks

  4. Long Distance vMotion is basically an enhanced vMotion (Storage and Compute) with a larger RTT Threshold.

  5. Hello Stephane, one of the primary use cases of Long Distance vMotion is multi-site capacity utilization. Many customers started to implement storage replication solutions across long distances using 3rd-party solutions such as EMC vPlex Geo. vMotion leverages such replication solutions, and thereby avoids the need to transfer VM storage to the new destination site.

  6. Is the graphic in Figure.3 incorrect? I would expect the Total Duration to increase as latency is increased rather than your graphic showing decrease in total time as latency is increased.

    Also I’m not getting these results in real world. I have 3 data centers all connected with 10Gbps pipe. Data center A to B is 1ms latency. Data Center A to C is 21ms latency. Migrating a VM with 1vcpu, 8GB ram and 50GB disk takes 4min from data center A to B and 28min from Data Center A to C. This is a very consistent average over 20+ migrations. FYI: Data Center C is not in production yet so the only thing on the wire was my migration. I can copy near line rate from A to C at 9.8Gbps. The VM also had no workload running. Running 6.0 update2. I have a case open with VMware but anyone else have any ideas?

  7. Hello David, Figure 3 is showing the vMotion duration is not impacted by the RTT latency. The runs do tend have little variation, mostly less than 5%. The latency aware optimizations added in vSphere 6 will ensure to size the socket buffers appropriately to hide the impact of RTT latency on duration. My feeling is you may be running into some other issues in your environment, such as poor storage IO throughput in Data Center C. I suggest that after you finish your migration, please check the vmkernel logs in which vMotion reports very useful performance statistics. Note that each vMotion has unique id (a.k.a migration ID) across both source and destination hosts. You could run the following command at the end of vMotion, and see the performance stats logged by vMotion. You should provide this data if you have a case opened with VMware already.
    # grep -i vmotion /var/log/vmkernel.log

  8. Hello

    Doesn’t storage throughput affect overall vMotion performance? What kind of storage is used to reach approx. 60Gb/s of vMotion throughput ?

    thanks

    1. Hello Stephane, vMotion throughput will be affected by storage IO throughput only when transferring the VM disks. Our test scenario used traditional vMotion, or vMotion across hosts with shared storage. In such scenario, only VM’s memory is transferred from source to destination, hence is not affected by storage IO throughput.

Leave a Reply

Your email address will not be published. Required fields are marked *