Home > Blogs > VMware VROOM! Blog > Tag Archives: vMotion

Tag Archives: vMotion

vMotion across hybrid cloud: performance and best practices

VMware Cloud on AWS is a hybrid cloud service that runs the VMware software-defined data center (SDDC) stack in the Amazon Web Services (AWS) public cloud. The service automatically provisions and deploys a vSphere environment on a bare-metal AWS infrastructure, and lets you run your applications in a hybrid IT environment across your on-premises data centers and AWS global infrastructure. A key benefit of VMware Cloud on AWS is the ability to vMotion workloads back and forth from your on-premises data center to the AWS public cloud as capacity and data privacy require.

In this blog post, we share the results of our vMotion performance tests across our hybrid cloud environment that consisted of a vSphere on-premises data center located in Wenatchee, Washington and an SDDC hosted in an AWS cloud, in various scenarios including hybrid migration of a database server. We also describe the best practices to follow when migrating virtual machines by vMotion across hybrid cloud.

Test configuration

We set up the hybrid cloud environment with the following specifications:

VMware Cloud on AWS

  • 1-host SDDC instance with Amazon EC2 i3.metal (Intel Xeon E5-2686 @ 2.3 GHz, 36 cores, 512 GB)
  • SDDC version: vmc1.6 (M6 – Cycle 17)
  • Auto-provisioned with NSX networking and VSAN storage

On-premises host

  • Dell PowerEdge R730 (Intel Xeon E5-2699 v4 @ 2.2GHz, 22 cores, 1 TB memory)
  • ESXi and vCenter version: 6.7
  • Storage: Dell NVMe, VMFS 5 volume
  • Networking: Intel 1GbE NIC (shared 2*10GbE DX links between on-prem and AWS)

Figure 1: Logical layout of the hybrid cloud setup

Figure 1 illustrates the logical layout of our hybrid cloud environment. We deployed a single-host SDDC instance on AWS cloud. The SDDC was the latest M6 version and auto-configured with vSAN storage and NSX networking. Our on-premises data center, located in Washington state, featured hosts running ESXi 6.7.

AWS Direct Connect

We used high-speed AWS Direct Connect links for connectivity between VMware on-prem data center and AWS Oregon region. AWS Direct Connect provides a leased line from the AWS environment to the on-premises data center. VMware recommends you use this type of link because it guarantees sustained bandwidth during vMotion, which isn’t possible with VPN internet connections. In our environment, there was about 40 milliseconds of round-trip latency on the network.

L2 VPN tunnel

We set up a secure L2 VPN tunnel for the compute traffic that spanned the two vCenters. This connected the VMs on cloud and on-premises to the same address space (IP subnet). So, the VMs remained on the same subnet and kept their IP addresses even as we migrated them from on-premises to cloud and vice versa.

Figure 2: Extending VXLAN across on-premises and cloud using L2 VPN

As shown in figure 2, two NSX Edge VMs provided VPN capabilities and the bridge between the overlay world (VXLAN logical networks) and the physical infrastructure (IP networks). Each NSX Edge VM was equipped with two virtual interfaces (vNICs): one vNIC was used as an uplink to the physical network, and the second vNIC was used as the VXLAN trunk interface.

Hybrid linked mode

Figure 3: A single console to manage resources across on-premises and cloud environments

We created a hybrid linked mode between the cloud vCenter and on-premises vCenter. This allowed us to use a single console to manage all our inventory across the hybrid cloud.  As shown in Figure 3, the cloud inventory included a single Client-VM provisioned in the compute workload resource pool and the on-premises inventory included three VMs including NSX-Edge VM, Client-VM and a Server VM.

Measuring vMotion performance

The following metrics were used to understand the performance implications of vMotion:

  • Migration time: Total time taken for migration to complete
  • Switch-over time: Time during which the VM is quiesced to enable switchover from on-premises to cloud, and vice versa
  • Guest penalty: Performance impact on the applications running inside the VM during and after the migration

Benchmark methodology

We investigated the impact of hybrid vMotion on a Microsoft SQL Server database performance using the open-source DVD Store 3 (DS3) benchmark, which simulates many customers performing typical actions in an online DVD Store (logging in, browsing, buying, reviewing, and so on).

The test scenario used a Windows Server 2012 VM configured with 8 VCPUs, 8 GB memory, 40 GB disk, and a SQL Server database size of 5 GB. As shown in figures 2 and 3, we used two concurrent DS3 clients, one client running on-premises, and a second client running on the cloud. Each client used a load of five DS3 users with 0.02 seconds of think time. We started the migration during the steady-state period of the benchmark when the CPU utilization (esxtop %USED counter) of the SQL Server VM was close to 275%, and the average write IOPS was 80.

Test results

Figure 4: SQL Server throughput at given time: before, during, and after hybrid vMotions

Figure 4 plots the performance of a SQL Server VM in total orders processed per second during vMotion from on-premises to cloud, and vice versa. In our tests, both DS3 benchmark drivers were configured to report the performance data at a fine granularity of 1 second (the default is 10 seconds). As shown in figure 4, the impact on SQL Server throughput was minimal during vMotion in both directions. The total throughput remained steady in the range of 75 operations throughout the test period. The vMotion durations from on-premises to cloud, and vice versa were 415 seconds, and 382 seconds, respectively, with the network throughput ranging between 500 to 900 megabits per second (Mbps). The switch-over time was about 0.6 seconds in both vMotions. The few minor dips in throughput shown in the figure were due to the variance in available network bandwidth on the shared AWS Direct Connect link.

Figure 5: Breakdown of SQL Server throughput reported by the on-premises and cloud clients

Figure 5 illustrates the impact of network latency on the throughput. While the total SQL Server throughput remained steady during the entire test period, the throughput reported by both on-premises and cloud clients varied based on their proximity to the SQL Server VM. For instance, throughput reported by the on-premises client drops from 65 operations to 10 operations when the SQL Server VM was onboarded to the cloud and jumps back to 65 operations after the SQL Server VM is migrated back to the on-premises environment.

The throughput variation seen by the two DS3 clients is not unique to our hybrid cloud environment and can be explained by Little’s Law.

Little’s Law

In queueing theory, Little’s Law theorem states that the average number (L) of customers in a stable system is equal to the average arrival rate (λ) multiplied by the average time (W) that a customer spends in the system. Expressed algebraically, the law is: L = λ × W

Figure 6: Little’s Law applicability in hybrid cloud performance testing

Figure 6 shows how Little’s Law can be applied to our hybrid cloud environment to relate the DS3 users, SQL server throughput, SQL Server processing time, and the network latency. The formula derived in figure 6 explains the impact of the network latency on the throughput (orders per second) when the benchmark load (DS3 users) is fixed. It should be noted, however, that although the throughput reported by both the clients varied due to the network latency, the aggregate throughput remained a constant. This is because the throughput decrease seen by one client is offset by the throughput increase seen by the other client.

This illustrates how important it is for you to monitor your application dependencies when you migrate workloads to and from the cloud. For example, if your database VM depends on a Java application server VM, you should consider migrating both VMs together; otherwise, the overall application throughput will suffer due to slow responses and timeouts.

One way to monitor your application dependencies is to use VMware vRealize Network Insight, which can mitigate business risk by mapping application dependencies in both private and hybrid cloud environments.

vMotion Stun During Page Send (SDPS)

We also tested vMotion performance by doubling the intensity of the DS3 workload on both on-premises and cloud clients. Although vMotion succeeded, vmkernel logs indicated vMotion SDPS kicked-in during test scenarios that had a higher benchmark load. SDPS is an advanced feature of vMotion that ensures migration will not fail due to memory copy convergence issues. Whenever vMotion detects that the guest memory dirty rate is higher than the available network bandwidth, it injects microsecond latencies to guest execution to throttle the page dirty rate, so the network transfer can catch up with the dirty rate. So, we recommend you delay the vMotion of a heavily loaded VMs on hybrid cloud environments with shared bandwidth links, which will prevent slowdown in the guest execution.

To learn more about SDPS, see “VMware vSphere vMotion Architecture, Performance, and Best Practices.”

vMotion across multiple availability zones in the SDDC

Every AWS region has multiple availability zones (AZ). Amazon does not provide service level agreements beyond an availability zone. For reasons such as failover support, VMware Cloud on AWS customers can choose an SDDC deployment that spans multiple availability zones in a single AWS region.

There are certain vMotion performance implications with respect to the SDDC deployment configuration.

Figure 7.  vMotion peak network throughput in a single availability zone vs. multiple availability zones

As shown in figure 7, vMotion peak network throughput depends on the host placement in the SDDC.

This is because vMotion uses a single TCP stream in the VMware Cloud environment. If the vMotion source and destination hosts are within the same availability zone, vMotion peak throughput can reach as high as 10 gigabits per second (Gbps), limited only by the CPU core speed. However, if the source and destination hosts are across availability zones, vMotion peak throughput is governed by the AWS rate limiter. The throughput of any single TCP or UDP stream across availability zones is limited to 5 Gbps by the AWS rate limiter.

Conclusion

In summary, our performance test results show the following:

  • vMotion lets you migrate workloads seamlessly across traditional, on-premises data centers and software-defined data centers on AWS Cloud.
  • vMotion offers the same standard performance guarantees across hybrid cloud environment, which includes less than 1 second of vMotion execution switch-over time, and minimal impact on guest performance.

References

Each vSphere release introduces new vMotion functionality, increased reliability and significant performance improvements. vSphere 5.5 continues this trend by offering new enhancements to vMotion to support EMC VPLEX Metro, which enables shared data access across metro distances.

In this blog, we evaluate vMotion performance on a VMware vSphere 5.5 virtual infrastructure that was stretched across two geographically dispersed datacenters using EMC VPLEX Metro.

Test Configuration

The VPLEX Metro test bed consisted of two identical VPLEX clusters, each with the following hardware configuration:

• Dell R610 host, 8 cores, 48GB memory, Broadcom BCM5709 1GbE NIC
• A single engine (two directors) VPLEX Metro IP appliance
• FC storage switch
• VNX array, FC connectivity, VMFS 5 volume on a 15-disk RAID-5 LUN


Figure 1. Logical layout of the VPLEX Metro deployment

Figure 1 illustrates the deployment of the VPLEX Metro system used for vMotion testing. The figure shows two data centers, each with a vSphere host connected to a VPLEX Metro appliance. The VPLEX virtual volumes presented to the vSphere hosts in each data center are synchronous, distributed volumes that mirror data between the two VPLEX clusters using write-through caching. As a result, vMotion views the underlying storage as shared storage, or exactly equivalent to a SAN that both source and destination hosts have access to. Hence, vMotion in a Metro VPLEX environment is as easy as traditional vMotion that live migrates only the memory and device state of a virtual machine.

The two VPLEX Metro appliances in our test configuration used IP-based connectivity. The vMotion network between the two ESXi hosts used a physical network link distinct from the VPLEX network. The Round Trip Time (RTT) latency on both VPLEX and vMotion networks was 10 milliseconds.

Measuring vMotion Performance

The following metrics were used to understand the performance implications of vMotion:

• Migration Time: Total time taken for migration to complete
• Switch-over Time: Time during which the VM is quiesced to enable switchover from source to the destination host
• Guest Penalty: Performance impact on the applications running inside the VM during and after the migration

Test Results


Figure 2. VPLEX Metro vMotion performance in vSphere 5.1 and vSphere 5.5

Figure 2 compares VPLEX Metro vMotion performance results in vSphere 5.1 and vSphere 5.5 environments. The test scenario used an idle VM configured with 2 VCPUs and 2GB memory. The figure shows a minor difference in the total migration time between the two vSphere environments and a significant improvement in vMotion switch-over time in the vSphere 5.5 environment. The switch-over time reduced from about 1.1 seconds to about 0.6 second (a nearly 2x improvement), thanks to a number of performance enhancements that are included in the vSphere 5.5 release.

We also investigated the impact of VPLEX Metro live migration on Microsoft SQL Server online transaction processing (OLTP) performance using the open-source DVD Store workload. The test scenario used a Windows Server 2008 VM configured with 4 VCPUs, 8GB memory, and a SQL Server database size of 50GB.


Figure 3. VPLEX Metro vMotion impact on SQL Server Performance

Figure 3 plots the performance of a SQL Server virtual machine in orders processed per second at a given time—before, during, and after VPLEX Metro vMotion. As shown in the figure, the impact on SQL Server throughput was very minimal during vMotion. The SQL Server throughput on the destination host was around 310 orders per second, compared to the throughput of 350 orders per second on the source host. This throughput drop after vMotion is due to the VPLEX inter-cluster cache coherency interactions and is expected. For some time after the vMotion, the destination VPLEX cluster continued to send cache page queries to the source VPLEX cluster and this has some impact on performance. After all the metadata is fully migrated to the destination cluster, we observed the SQL Server throughput increase to 350 orders per second, the same level of throughput seen prior to vMotion.

These performance test results show the following:

  • Remarkable improvements in vSphere 5.5 towards reducing vMotion switch-over time during metro migrations (for example, a nearly 2x improvement over vSphere 5.1)
  • VMware vMotion in vSphere 5.5 paired with EMC VPLEX Metro can provide workload federation over a metro distance by enabling administrators to dynamically distribute and balance the workloads seamlessly across data centers

To find out more about the test configuration, performance results, and best practices to follow, see our recently published performance study.

IPv6 performance improvements in vSphere 5.5

Many of our customers use IPv6 networks in their datacenters for a variety of reasons. We expect that many more will transition from IPv4 to IPv6 to reap the large address range and other benefits that IPv6 provides. Keeping this in mind, we have worked on a number of performance enhancements for the way that vSphere 5.5 manages IPv6 network traffic. Some new features that we have implemented include:

• TCP Checksum Offload: For Network Interface Cards (NICs) that support this feature, the computation of the TCP checksum of the IPv6 packet is offloaded to the NIC.

• Software Large Receive Offload (LRO): LRO is a technique of aggregating multiple incoming packets from a single stream into a larger buffer before they are passed higher up the networking stack, thus reducing the number of packets that have to be processed and saving CPU. Many NICs do not support LRO for IPv6 packets in hardware. For such NICs, we implement LRO in the vSphere network stack.

• Zero-Copy Receive: This feature prevents an unnecessary copy from the packet frame to a memory space in the vSphere network stack. Instead, the frame is processed directly.

vSphere 5.1 offers the same features, but only for IPv4. So, in vSphere 5.1, services such as vMotion, NFS, and Fault Tolerance had lower bandwidth in IPv6 networks when compared to IPv4 networks. vSphere 5.5 solves that problem—it delivers similar performance over both IPv4 and IPv6 networks. A seamless transition from IPv4 to IPv6 is now possible.

Next, we demonstrate the performance of vMotion over a 40Gb/s network connecting two vSphere hosts. We also demonstrate the performance of networking traffic between two virtual machines created on the vSphere hosts.

System Configuration
We set up a test environment with the following specifications:

• Servers: 2 Dell PowerEdge R720 servers running vSphere 5.5.
• CPU: 2-socket, 12-core Intel Xeon E5-2667 @ 2.90 GHz.
• Memory: 64GB memory; 32GB spread across two NUMA nodes.
• Networking: 1 dual-port Intel 10GbE and 1 dual-port Broadcom 10GigE adapter placed on separate PCI Gen-2 x8 lanes in both machines. We thus had 40Gb/s of network connectivity between the two vSphere hosts.
• Virtual Machine for vMotion: 1 VM running Red Hat Enterprise Linux Server 6.3 assigned 2 virtual CPUs (vCPUs) and 48GB memory. We migrate this VM between the two vSphere hosts.
• Virtual Machines for networking tests: A pair of VMs running Red Hat Enterprise Linux server 6.3, assigned 4 vCPUs and 16GB memory, on each host. We use these VMs to test the performance of networking traffic between two VMs.

We configured each vSphere host with four vSwitches, each vSwitch having one 10GbE uplink port. We created one VMkernel adapter on each vSwitch. Each VMkernel adapter was configured on the same subnet. The MTU of the NICs was set to the default of 1500 bytes. We enabled each VMkernel adapter for vMotion, which allowed vMotion traffic to use the 40Gb/s network connectivity. We created four VMXNET3 virtual adapters on the pair of virtual machines used for networking tests.

Methodology
In order to demonstrate the performance for vMotion, we simulated a heavy memory usage footprint in the virtual machine. The memory-intensive program allocated 48GB memory in the virtual machine and touched one byte in each page in an infinite loop. We migrated this virtual machine between the two vSphere hosts over the 40Gb/s network. We used net-stats to monitor network throughput and CPU utilization on the sending and receiving systems. We also noted the bandwidth achieved in each pre-copy iteration of vMotion from VMkernel logs.

In order to demonstrate the performance of virtual machine networking traffic, we use Netperf 2.60 to simulate traffic from one virtual machine to the other. We create two connections for each virtual adapter. Each connection generates traffic for the TCP_STREAM workload, with 16KB message size and 256KB socket buffer size. As in the previous experiment, we used net-stats to monitor network throughput and CPU utilization.

Results
Figures 1 and 2 show, for IPv4 and IPv6 traffic, the network throughput and CPU utilization data that we collected over the 40-second duration of the migration. After the guest memory is staged for migration, vMotion begins iterations of pre-copying the memory contents from the source vSphere host to the destination vSphere host.

In the first iteration, the destination vSphere host needs to allocate pages for the virtual machine. Network throughput is below the available bandwidth in this stage as vMotion bandwidth usage is throttled by the memory allocation on the destination host. The average network bandwidth during this phase was 1897 megabytes per second (MB/s) for IPv4 and 1866MB/s for IPv6.

After the first iteration, the source vSphere host sends the delta of changed pages. During this phase, the average network bandwidth was 4301MB/s with IPv4 and 4091MB/s with IPv6.

The peak measured bandwidth in netstats was 34.5Gb/s for IPv4 and 32.9Gb/s for IPv6. The CPU utilization of both systems followed a similar trend for both IPv4 and IPv6. Please also note that vMotion is very CPU intensive on the receiving vSphere hosts, and high CPU clock speed is necessary to achieve high bandwidths. The results are summarized in Table 1. In all, migration of the virtual machine was complete in 40 seconds regardless of IPv4 or IPv6 connectivity.

vMotion over an IPv4 network
Figure 1. vMotion over an IPv4 network
vMotion over an IPv4 network
Figure 2. VMotion over an IPv6 network

vMotion-IPv4 vs IPv6
Table 1. vMotion results—IPv4 versus IPv6

The results for virtual machine networking traffic are in Table 2. While the throughput with IPv6 is about 2.5% lower, the CPU utilization is the same on both the sending as well as the receive sides.

Virtual Machine Performance - IPv4 vs IPv6
Table 2. Virtual machine networking results—IPv4 versus IPv6

Thanks to a number of IPv6 enhancements added to vSphere 5.5, migrations with vMotion occur over IPv6 networks at speeds within 5%, compared to those over IPv4 networks. For virtual machine networking performance, the throughput of IPv6 is within 2.5% of IPv4. In addition, testing shows that we can drive bandwidth close to 40Gb/s link speeds with both protocols. Combined, this functionality allows for a seamless transition from IPv4 to IPv6 with little performance impact.

Impact of Enhanced vMotion Compatibility on Application Performance

Enhanced vMotion Compatibility (EVC) is a technique that allows vMotion to proceed even when ESXi hosts with CPUs of different technologies exist in the vMotion destination cluster. EVC assigns a baseline to all ESXi hosts in the destination cluster so that all of them will be compatible for vMotion. An example is assigning a Nehalem baseline to a cluster mixed with ESXi hosts with Westmere, Nehalem processors. In this case, the features available in Westmere would be hidden, because it is a newer processor than Nehalem. But all ESXi hosts would “broadcast” that they have Nehalem features.

Tests showed how utilizing EVC with different applications affected their performance. Several workloads were chosen to represent typical applications running in enterprise datacenters. The applications represented included database, Java, encryption, and multimedia. To see the results and learn some best practices for performance with EVC, read Impact of Enhanced vMotion Compatibility on Application Performance.

Performance Best Practices for VMware vSphere 5.0

A new version of Performance Best Practices for vSphere is now available.  This is a book designed to help system administrators obtain the best performance from vSphere deployments.

We've addressed many of the new features in vSphere 5.0 from a performance perspective.  These include:

  • Storage Distributed Resource Scheduler (Storage DRS), which performs automatic storage I/O load balancing
  • Virtual NUMA, allowing guests to make efficient use of hardware NUMA architecture
  • Memory compression, which can reduce the need for host-level swapping
  • Swap to host cache, which can dramatically reduce the impact of host-level swapping
  • SplitRx mode, which improves network performance for certain workloads
  • VMX swap, which reduces per-VM memory reservation
  • Multiple vMotion vmknics, allowing for more and faster vMotion operations

We've also significantly updated and expanded many of the topics we've covered in previous editions of the book.  These include:

  • Choosing hardware for a vSphere deployment
  • Power management
  • Configuring ESXi for best performance
  • Guest operating system performance
  • vCenter and vCenter database performance
  • vMotion and Storage vMotion performance
  • Distributed Resource Scheduler (DRS) and Distributed Power Management (DPM) performance
  • High Availability (HA), Fault Tolerance (FT), and VMware vCenter Update Manager performance

The book can be found at: Performance Best Practices for VMware vSphere 5.0.