Use VCF API for expanding a cluster between racks without extending Layer 2 in the physical network.

Overview

Expanding a cluster across racks serves two key purposes: increasing capacity and enhancing resiliency for workloads. Servers within a single rack often share power and network infrastructure, making them susceptible to simultaneous failure.

In VMware Cloud Foundation (VCF), adding hosts to a cluster is simple when the new hosts reside on the same subnet as the existing ones. However, modern data center designs discourage L2 extension between racks due to the broad impact of local failures in traditional L2 networks. Instead, small L2 broadcast domains—typically limited to individual racks—are preferred, meaning hosts in separate racks operate on distinct subnets.

While technologies like EVPN over VXLAN allow safe subnet extension across racks, they rely heavily on physical infrastructure configuration. This reliance can hinder automation and compromise the seamless, cloud-like experience expected in VCF environments.

This document aims to identify scenarios where cluster expansion is achievable without requiring L2 extension and to explain how to accomplish it—spoiler: it’s done via API 😉

Types of VCF networks and clusters

In a VCF environment, an ESXi host is attached to several networks that are visible to VCF. They can be classified in two categories: VM networks and host infrastructure networks.

VM Networks

These networks connect virtual machines (VMs) critical to the VMware Cloud Foundation (VCF) infrastructure, such as the SDDC Manager, vCenter servers, and NSX managers and others.

VM Management Network: provides management connectivity for infrastructure VMs.

On the top of this VM management network, clusters hosting NSX edges An NSX Edge VM typically runs a Tier-0 gateway, which acts as a virtual router linking the NSX virtual network to the physical network. While this paper does not delve into the specifics of clusters with NSX Edge VMs, it is important to note that the Tier-0 gateway requires the following:

Edge TEP Network: the tunnel end points (TEP) of the edge use this network for connecting to NSX segments.
Edge Uplink Networks: provide VLAN for connecting the Tier0 gateway to physical routers.

All these networks must extend between racks to support VM mobility for high availability. VMs must retain their IP addresses during dynamic rack transitions. Because VCF edge clusters always include a VM management network (on the top of their specific edge TEP/uplinks networks), the common denominator between all the clusters requiring L2 extension between racks is the presence of a VM management network.

ESXi host infrastructure networks

These networks connect the vmkernel interfaces of ESXi hosts.

Management Network: Used for managing the ESXi host.
vMotion and vSAN Networks : Defined collectively within a network pool in VCF, used by vMotion and vSAN features.
Host TEP Network: the TEPs of the ESXi hosts use this network for implementing NSX segments.

These networks do not technically require extension across racks since ESXi hosts themselves are physical devices with fixed IP addresses tied to their rack locations. However, VCF configuration will require spanning those networks across racks if the cluster being expanded includes a VM management network. A notable exception is the vSAN stretched cluster, out of the scope of this document.

Determining the type of cluster expansion

Only clusters without a VM management network can be expanded across racks without requiring L2 extension in the physical infrastructure. To check if a cluster has a VM management network, navigate to Workload Domains → Clusters → Network in the SDDC Manager UI. The screenshot below highlights the VM management network in red for an edge cluster:

Expansion across racks of a cluster with no VM Management network

A quick word on UI expansion

The UI does not support selecting hosts from a different network pool or specifying a Host TEP network during expansion, making L2 extension in the physical network necessary.

The API method in the next section provides the only practical solution for cluster expansion across racks without relying on L2 extension.

API expansion

A cluster can be expanded to a different rack without requiring L2 extension in the physical infrastructure if no VM Management network is defined in VCF, and the expansion is executed via API.

The diagram below illustrates a cluster initially confined to Rack 1, expanded to include hosts in Rack 2. In this setup, the network pool, Host TEP network, and Host Management network are distinct.

Notice also that workload VMs can still utilize subnets spanning across racks (one such network is represented in green.) This extension is provided by NSX and is independent of the physical network configuration, achieving the desired outcome.

Network pools are assigned to hosts during provisioning, and VCF ignores the host management network. The main complexity lies in configuring the Host TEP network during cluster expansion across racks. The example below explains the process.

API cluster expansion example

This section outlines the process of expanding “Lab-cluster” to include a host in a new rack.

Initial Setup:
“Lab-cluster” initially consists of three hosts (new0.corp.vmbeans.com, new1.corp.vmbeans.com and new2.corp.vmbeans.com) in rack1. A fourth host, new3.corp.vmbeans.com in rack2 is being added via API.
Infrastructure Networks:
- Host Management: Rack1 and rack2 use different host management networks, which VCF does not manage or require to match.
- Network Pools: Existing hosts use the “Network-pool-rack1,” while new3.corp.vmbeans.com is provisioned with “Network-pool-rack2,” including different vSAN and vMotion networks.
- Host TEP Network: All current hosts in the cluster share the same host TEP network, managed using a Transport Node Profile (TNP) in NSX. Since the new host in rack2 cannot use the rack1-specific TEP configuration, VCF must create a sub-TNP with a host TEP network specific to rack2.

The diagram below provides a high-level representation of this lab scenario:

To add new3.corp.vmbeans.com to the existing “Lab-cluster,” follow these steps:

Retrieve the Cluster ID
Use the following API call to get the ID of “Lab-cluster”:
GET /v1/clusters
Validate the input specification for the API call
POST /v1/clusters/{cluster ID}/validations
-Use the cluster ID retrieved at the previous step
-The body of this call is the body represented at the bottom of this section, encapsulated in
“clusterUpdateSpec” : { <body> }
Update the Cluster Configuration
PATCH /v1/cluster/{cluster ID}
The API request body for this operation is detailed below:

{ “clusterExpansionSpec”: { “hostSpecs”: [ {
`"id": "2a1c3eec-2b16-41bc-9b10-825e1343f22e",`	The id refers to the host new3.corp.vmbeans.com in rack2, being added to the cluster. The network pool was assigned during commissioning in VCF, so this step only configures the NSX-specific settings for its TEPs. While this example adds a single host, multiple hosts can be included in a single API call.
`"licenseKey": "xxx",`	LicenseKey: The license key for the host, which can also be applied directly on the host.
`"username": "root",`	The username for logging into the host
`"hostNetworkSpec": {`
`"vmNics": [ { "id": "vmnic0", "vdsName": "cluster-vds", "uplink": "uplink1" },`	Host vmnic definitions: – Specify the vmnic name. – Define the VDS it is attached to and the corresponding VDS uplink. The new host uses the same VDS as the existing hosts in the cluster.
`{ "id": "vmnic1", "vdsName": "cluster-vds", "uplink": "uplink2" } ],`
`"networkProfileName": "alternate-network-profile" } } ],`	The NSX network profile defines the TEP network configuration. For this expansion, a new network profile, alternate-network-profile, is created in this API call to use a different TEP network.
`"networkSpec": { "nsxClusterSpec": {`
`"ipAddressPoolsSpec": [ { "name": "alternate-address-pool", "description": "different IP pool for rack2",`	Rack2 uses a different subnet for its infrastructure networks, requiring host TEPs to use a separate NSX IP pool. In this API call, a new IP pool, alternate-address-pool, is defined for this purpose.
"subnets": [ { "ipAddressPoolRanges": [ { "start": "192.168.243.50", "end": "192.168.243.100" } ], "cidr": "192.168.243.0/24", "gateway": "192.168.243.1" } ] } ],
`"uplinkProfiles": [ { "name": "alternate-uplink-profile",`	We are defining an uplink profile for the new rack. An NSX uplink profile defines the name of the uplinks used by NSX, their teaming policy and finally, the transport VLAN (which is the VLAN ID that is used for TEP traffic). There is no need to create different NSX uplink names or teaming policy for the host we’re adding in rack 2. We’re creating a new uplink profile because we want to use a different VLAN ID for NSX traffic on rack2. If we wanted to keep in rack2 the same VLAN ID as for rack1 (note that same VLAN ID does not mean same VLAN), we could use the existing uplink profile of rack1.
`"teamings": [ { "policy": "LOADBALANCE_SRCID", "activeUplinks": [ "nsx-uplink-1", "nsx-uplink-2" ], "standByUplinks": [] } ],`	Definition of two uplinks for NSX traffic: nsx-uplink-1 and nsx-uplink-2. Those uplinks are just names (we could have used any string of character here) used when defining the default teaming policy of type “load-balance source port ID”. Eventually, the NSX uplinks will be mapped to VDS uplinks.
`"transportVlan": 243 } ] },`	The VLAN ID for TEP traffic differs in rack2 compared to rack1, as noted earlier.
"networkProfiles": [ { "name": "alternate-network-profile", "description": "different TEP network for track2", "nsxtHostSwitchConfigs": [ { "vdsName": "cluster-vds", "uplinkProfileName": "alternate-uplink-profile", "ipAddressPoolName": "alternate-address-pool", "vdsUplinkToNsxUplink": [ { "vdsUplinkName": "uplink1", "nsxUplinkName": "nsx-uplink-1" }, { "vdsUplinkName": "uplink2", "nsxUplinkName": "nsx-uplink-2" } ] } ] } ] } } }	The network profile referenced in the “hostSpecs” section defines all NSX-specific details required to configure NSX on a host. It includes: – The uplink profile. – The previously defined IP pool (note that DHCP is also supported, just leave the ipAddressPoolName empty in that case.) – Associations between NSX uplinks and VDS uplinks. In NSX, this information is represented as a Transport Node Profile (TNP), a template for configuring multiple hosts identically within a cluster.The new network profile created via this API differs from the one used in rack1, preventing the use of a consistent TNP across the entire cluster. As a result, VCF creates a “sub-TNP” for the “alternate-network-profile” and applies it to a “sub-cluster” grouping the hosts in rack2. The sub-cluster is named after the new network profile by VCF.

Body for cluster expansion API request

When this API call is executed, new3.corp.vmbeans.com is added to the Lab-cluster. As the new host uses a different network pool than the original cluster, VCF automatically creates new VDS dvpgs for the vSAN and vMotion networks, as shown in the screenshot below.

The host TEP network details are not directly visible in VCF but can be accessed via the NSX Manager. The screenshot below shows the “alternate-uplink-profile” configured by VCF for the new host in rack2:

The screenshot below shows the “Lab-cluster” configuration in NSX. The cluster is assigned a TNP named “vcenter-vi1-paris-Lab-cluster,” while the newly added host (new3.corp.vmbeans.com) is assigned a sub-TNP called “alternate-network-profile.” This sub-TNP is of course pushing the configuration defined in the network profile part of the API call.

Overview

Types of VCF networks and clusters

VM Networks

ESXi host infrastructure networks

Determining the type of cluster expansion

Expansion across racks of a cluster with no VM Management network

A quick word on UI expansion

API expansion

API cluster expansion example

Related Articles

Expand a cluster across racks in VMware Cloud Foundation

Use Cases for Implementing VMware Private AI Foundation with NVIDIA - part 2

Use Cases for Implementing VMware Private AI Foundation with NVIDIA - part 1