A common question arises when customers are migrating workloads between ESXi hosts, clusters, vCenter Servers or data centers. What network is being used when a hot or cold migration is initiated? This blog post explains the difference between the various services and networks stacks in ESXi, and which one is used for a specific type of migration.
How do we define what is a hot or cold migration? A cold workload migration is a virtual machine that is powered off in the entire duration of migration. A hot migration means that the workload and application will remain available during the migration.
Both hot and cold migrations can be initiated through the vCenter Server UI or in an automation fashion using, for example, PowerCLI. To understand which network is used for a migration, we first need to understand the various enabled services and network stack options that are available in vSphere.
In vSphere, we define the following services that can be enabled on VMkernel interfaces:
- Fault Tolerance logging
- vSphere Replication
- vSphere Replication NFC (Network File Copy)
When looking specifically into workload migrations, there are three services that play an important role. The vMotion, Provisioning and Management enabled networks.
Enabling a service on a specific VMkernel interface, states that this network can now be used for the configured service. While the Management service is enabled by default on the first VMkernel interface, the other VMkernel interfaces and services are typically configured post-installation of ESXi. If you want vMotion or Provisioning traffic to use a specific VMkernel interface, you can configure it like that.
We also have the option to use a seperate network, or TCP/IP, stack. vSphere provides the following options:
If no settings are changed, the Default stack is used for all the VMkernel interfaces. The purpose of using the other TCP/IP stacks, is to have more segregation of traffic to isolate certain network flows. Also, configuring and using the Provisioning and vMotion stack allows you to use other gateway IP addresses, DNS servers and DHCP servers for these networks rather than using your default gateway that is defined in the Default TCP/IP stack.
Cold migrations are a mere re-registration and potential copy of a virtual machine and its data to another ESXi host and/or datastore.
Cold migrations are typically performed when you migrate virtual machines when you are moving them between ESXi hosts that are equipped with different CPU architectures, like Intel and AMD. vSphere vMotion cannot live-migrate between the two CPU architectures. Another reason for performing cold migrations could be that an application owner requires the virtual machine to be powered off during migrations. Either to mitigate the risk for data loss or other availability challenges within the workload itself. Typically, vSphere vMotion does a really good job live-migrating workloads without any concessions on application availability or the risk of data corruption, but there might be scenarios where hot (or ‘live’) migrations are not used by customers.
The biggest misconception for cold migration and cold data, is that the vMotion network is leveraged to perform the migration. However, cold data like powered off virtual machines, clones and snapshots are migrated over the Provisioning network if that is enabled. It is not enabled by default. If it’s not configured, the Management enabled network will be used.
A hot migration is referred to as a live migration. It is a staged migration where the virtual machine stays powered on during the initial full synchronization and the subsequent delta sync, using the vSphere vMotion feature. If you want to know more about the vMotion process, check out this blog and video series.
Please note that if you create a VMkernel interface on the vMotion TCP/IP stack, you can use only this stack for vMotion on this host. The VMkernel interfaces on the default TCP/IP stack are disabled for the vMotion service.
To make it a little bit more complicated; Even though the live-migration itself uses the vMotion enabled network, the cold data during a vMotion is still transferred using the Provisioning network if configured, or the Management network when the Provisioning network is not configured. VM snapshots, non-child delta disks, and vmx logs are referred to as cold data during a vMotion as described in KB article 59323.
Bringing it Together
Use the following diagram to determine what network is used for a migration:
This is all about on-prem migrations. When you are migrating workloads to a public cloud like with VMware Cloud on AWS, you could use VMware HCX. HCX is completely agnostic to on-prem vMotion or Provisioning networks. When you define a compute profile for HCX, it will let you choose the appropriate network for the migration.
More Resources to Learn
Refer to the following online sources for more information on workload migrations:
- Place vMotion Traffic on the vMotion TCP/IP Stack of an ESXi Host
- Understand vMotion networking requirements
- The vMotion Process Under the Hood
- Demo with HOL: Multi-Cloud Mobility