VMware Cloud on AWS Cloud Migration

Selecting IP Subnets and Connectivity for your SDDC

When planning your SDDC deployment, there are a number of different network subnets that come into play. This article will identify them and explain the primary considerations for the selection of new networks to be used within the SDDC, as well as identifying existing networks that will interact with the SDDC. The red numbers in the diagram correspond to the description text below.

1 – SDDC Management CIDR

The Management CIDR is used for all of the internal management components within the SDDC, such as the ESXi hosts (management, vMotion, and other interfaces), vCenter, NSX Manager, and any other fully-managed add-on components deployed into the SDDC, for example, HCX and Site Recovery appliances.  It is not available for workload VMs and must be dedicated to the VMware managed components of the SDDC.

The management CIDR must be one of 3 available sizes: /16, /20 or /23. The primary factor in selecting the size is the anticipated scalability of the SDDC. In single-AZ deployment, a /23 CIDR can support a maximum of 23 ESXi hosts, while a /20 can support up to 228, and a /16 up to 4091, but currently limited to the SDDC maximum of 300 hosts.  When deploying a multi-AZ (or stretched cluster) SDDC, the maximum sizes are 18 hosts, 224 hosts, and the SDDC maximum hosts for /23, /20 and /16 CIDRs respectively.  It’s also important to note that some hosts are reserved for maintenance operations: the number of usable hosts will be further reduced for every additional cluster. The numbers above are calculated using the minimum possible number of clusters to achieve maximum host count. As an example, an SDDC using a /23 management CIDR, configured with 2 clusters will only be able to deploy 23 hosts. The remaining 4 hosts are reserved to be added by maintenance operations (upgrades, in case of a host failure, etc.). A multi-AZ SDDC using a /20 management CIDR with 18 clusters is limited to 218 hosts (109 in each AZ). In addition, certain features will also consume address space that reduces the maximum available host count – including being a member of an SDDC Group, using Multi-Edge (Traffic Groups), using the SDDC as a VCDR recovery SDDC, and as described above, deploying additional clusters.

If there is any potential for growth beyond a handful of hosts or an expectation of deploying multiple clusters, or using any of the features listed above, a /20 CIDR is recommended. In addition, if Hybrid Cloud Extension (HCX) and/or Site Recovery is being used, there is a limit of 40 total HCX & Site Recovery appliances with a /23 CIDR.  Although this will be sufficient in almost all cases, it could be a factor if multiple on-premise sites are being paired with the SDDC. The limit is high enough to not be a factor for /20 and /16 CIDRs.

As the management CIDR cannot be changed after the SDDC has been deployed, it is best to only use a /23 for testing or SDDCs with a specific purpose and lifetime that will not require much growth in capacity.

The management CIDR must be an RFC1918 network (i.e. within 10.0.0.0/8, 172.16.0.0/12 or 192.168.0.0/16). In addition, the networks 10.0.0.0/15 (10.0.0.0->10.1.255.255) and 172.31.0.0/16 are reserved and the management CIDR cannot overlap any of these ranges and should also be unique within the entirety of the enterprise network.

When connecting the SDDC back to an on-premises environment using a route-based VPN, this network will be advertised as entered into BGP, but over a Direct Connect (DX), it will be broken down into 3 different subnets when advertised into BGP.  The following table shows how it is broken down with a DX (all based on a default network of 10.2.0.0).

Management CIDR 10.2.0.0/23 10.2.0.0/20 10.2.0.0/16
Subnets advertised over Direct Connect 10.2.0.0/24

10.2.1.0/26

10.2.1.128/25

10.2.0.0/21

10.2.8.0/23

10.2.12.0/22

10.2.0.0/17

10.2.128.0/19

10.2.192.0/18

2 – AWS VPC & subnet

All production (3-node or higher) SDDCs must be connected to an Amazon AWS account, VPC, and subnet at deployment. The AWS subnet will only be used for communications between workloads running in your SDDC and native AWS services in the connected VPC (such as EC2 instances, RDS instances) or S3 buckets over the SDDC’s cross-account ENI.  Note the scope of communication between the SDDC and the connected AWS account is limited to services (or their endpoints) running in the connected VPC and using the main route table.  In addition, only the primary VPC CIDR is supported by VMware Cloud on AWS.  Any services in secondary VPC CIDRs will not be able to communicate with the SDDC over the ENI.

The VPC CIDR should be unique within the enterprise network and should not overlap with any other networks used in the SDDC, including the Management CIDR discussed in the previous section.  Only IPv4 addresses are currently supported for use with VMware Cloud on AWS, although it is possible to use either RFC1918 private IP space, or any assigned public IP space your organization has for the VPC’s CIDR.

The minimum size for the subnet the SDDC is linked to is /27, but to support the maximum capacity of the SDDC’s management cluster, we recommend using a /26 subnet. There is no advantage to using a larger subnet than a /26. Once the SDDC has been created using the selected subnet, it is important to not delete or change the subnet.  In addition, any other subnets that need to communicate with compute networks in the SDDC must be associated with the connected VPC’s main route table.  This includes the subnet the SDDC is connected to, as currently VMware Cloud on AWS only updates the main route table with SDDC routes. It is best not to modify these routes manually, and also a best practice to dedicate the selected subnet for the SDDC, by deploying any native services in different subnets within the VPC.  For this reason, make sure to size the VPC sufficiently large to accommodate current and future AWS native workloads that will interact with the SDDC (the default CIDR for a VPC is /16, which is the maximum size permitted).

Prior to running the CloudFormation template that creates the link between your VMware Cloud Services Org and AWS account, ensure you have created a VPC with a subnet in the Availability Zone(s) (AZ) within the AWS Region(s) where the SDDC(s) will reside.  Since AWS AZ names are randomized for each AWS account (for example, us-east-1a for one account might correspond to us-east-1c for another), the recommendation is to define a subnet in every AZ within the Region (VMware Cloud on AWS does not necessarily have available facilities in every AZ within an active AWS Region).  You can then use the subnet itself as the reference for the AZ rather than the AZ name.  The AZ of the subnet selected determines which AZ the SDDC is physically deployed in. Select the same AZ as any existing AWS workloads or VMC SDDCs to minimize latency and bandwidth costs between them and VMs in the SDDC, or conversely, select a different AZ from AWS instances or VMC SDDCs to maximize availability (note traffic will be subject to cross-AZ charges if the source and destinations are in different AZs).

3 – SDDC Compute Networks

SDDC Compute networks are assigned to Network Segments in the SDDC for workload VMs to connect to. These networks are defined after the SDDC is created under the Networking & security -> Segments page in the VMware Cloud on AWS console.  No networks are created by default.

Three types of Compute networks are available: Routed, Extended, or Disconnected.  Routed networks create the specified gateway in the SDDC, and will advertise the network over BGP for Direct Connect or route-based VPN, and enable the network as a local network in policy-based VPN.  Extended networks are for use with L2VPN, and no gateway is created in the SDDC, so the network gateway must exist on-prem through the L2VPN.  Disconnected networks provide the definition of a gateway, but the interface is left disconnected.  This can be used to create an isolated network in the SDDC, one that uses a VM as a gateway, and is also used by HCX Layer 2 extension services. Extended networks can be toggled to routed networks, which connects the gateway specified when the segment was created, thus advertising the network over BGP and performing routing functions.

VMs on routed networks can connect to the Internet through the Compute Gateway (CGW) when the appropriate FW rules are defined on it through the Internet Interface or All Uplinks.  Note that any bandwidth charges incurred by traffic leaving the CGW will be billed by VMware.  Traffic will use the public IP shown in the Compute Gateway section on the Networking & Security -> Overview page of the console for its source IP when it exits the CGW.  If the default route (0.0.0.0/0) is advertised over a DX private VIF or route-based VPN, or 0.0.0.0/0 is entered as a remote network on a policy VPN, then any traffic destined for the Internet will instead go over that connection, and the appropriate Compute Gateway firewall rules will need to be created an applied to the right uplink.

The CGW firewall has 4 uplink interfaces, and rules can be applied to them in any combination.  The interfaces provide connectivity to

  • Internet: This interface provides access to the Internet directly through the SDDC, using either the default NAT IP or using a custom NAT rule to a public IP which also allows DNAT (Inbound traffic).  Note the FW rule always references the private IP, not the public one.  This interface also handles private traffic over a policy-based IPSec VPN using the Public IP.
  • Intranet Interface: This interface provides access to any Direct Connect private VIFs attached to the SDDC.  Up to 4 VIFs can be connected, and traffic will use BGP to determine which link to use, as per DX to native AWS.  Policy-based IPSec VPN using the private IP will also use this interface, and it is also used for connectivity over SDDC Groups (also called VMware Transit Connect, or  vTGW)
  • VPN Tunnel Interface: This interface handles all route-based IPSec VPN traffic.
  • VPC Interface: This interface provides access to the connected VPC (linked AWS account).  Note that the default Security Group (SG) for the connected VPC protects the side of this interface in the AWS account.  Outbound traffic from the SDDC matches inbound traffic on the SG, and vice-versa.
  • All Uplinks: This is a shortcut for the Internet, Intranet & VPC Interfaces.  Note that it does NOT apply to the VPN Tunnel Interface.

Each network segment supports a single IPv4 subnet with a default gateway IP.  This network supports a maximum of 1000 MAC addresses, so using a /22 or smaller is recommended.

There is a limit of 200 routed network segments, and 100 L2VPN segments.  Connectivity between routed networks within the SDDC is routed via the SDDC’s Distributed Router (DR) so traffic stays within the SDDC, but for extended/bridged networks, the default gateway is always on-premises, so traffic will need to go across the L2VPN or HCX Layer 2 extension to be routed.  You will need to use either a physical router or NSX Edge as the gateway, as an NSX DLR can conflict with the SDDC’s DR. HCX proximity routing support is being added in 1.8v2 which, when enabled, will detect whether the destination IP is in the same SDDC, and if so route it directly to that network, avoiding the hair pinning back and forth across the Layer 2 extension network.

When using a Direct Connect (DX) to access compute networks, there is a maximum of 16 segments that will be advertised over BGP, as well as the 3 networks for the management CIDR described earlier.  If more networks are required, please reach out to support via chat.

4 – On-premises networks

The on-premises networks represent the enterprise networks that will communicate with the SDDC.  There are 3 options for linking the SDDC with these networks: IPSec VPN, AWS Direct Connect (DX) or stretched Layer 2, via either HCX L2 Extension or L2VPN with NSX Standalone Edge.

Multiple connections, and/or multiple different connection types can be combined to provide redundancy in some cases.  For example, it is possible to use DX for the primary communication path, and failover to an IPSec VPN (requires a setting change on the Direct Connect page in the VMC console).  It’s also possible to use all different connection types simultaneously, for different network destinations.  However, networks connected with an L2 extension cannot be combined with any other connection types.

When creating an IPSec VPN, it is not necessary to create firewall rules to allow the IPSec endpoint traffic, but any traffic that you want to traverse the VPN tunnels must be explicitly permitted in the CGW policy on the appropriate interface (Internet for Public IP VPN, Direct Connect for Private IP VPN).  An exception to this is when using a route-based VPN, if the tunnel interfaces are using IP addresses outside of 169.254.0.0/16, then you will need to create a rule to allow BGP between the tunnel IPs used on the CGW FW (applied to the VPN interface).

Also note that the CGW does not apply to traffic on an L2 connection (L2VPN or HCX L2E), as the traffic is tunneled through the CGW, and the tunnels themselves are permitted by default.

While the CGW and its uplinks protect traffic to your workloads running inside the SDDC, access to the management components in the SDDC, including vCenter, NSX Manager, and HCX Manager, SRM & vSphere Replication appliances if deployed, is controlled by the Management Gateway Firewall (MGW).  The MGW firewall works the same way as the CGW, however, any groups created for identifying networks are unique to the gateway they were created for.  The MGW has some additional restrictions – either the source or destination must be one of the system-defined groups.  This prevents creating arbitrary rules, as every rule must apply to a specific management component, and only the specific available services are allowed for the selected management component (i.e. vCenter permits HTTPS, ICMP, and SSO traffic).

The CGW and MGW both have default deny rules in place that cannot be disabled.  The CGW has a second default deny rule that applies to the VPN Tunnel interface, which can be modified, but the one applied to All Uplinks cannot.

When using a DX or route-based IPSec VPN, routes are exchanged through BGP.  There are 2 separate BGP instances in an SDDC: The DX uses a BGP instance on AWS’s VGW, and so AWS supported configurations will work with your SDDC.  The SDDC will advertise the first 16 network segments, in addition to the 3 management routes described in section 1 to all DX private VIFs connected to the SDDC. If more than 16 logical segments need to be advertised on the DX, please contact support via chat to request an increase.  Note that there are some limitations, so it may not be possible to accommodate all such requests.

When connecting a Direct Connect Gateway (DXGW) through an SDDC Group, the BGP is terminated on the DXGW in the customer’s AWS account. The prefixes reachable through the DXGW (up to 20) are configured manually either in the SDDC Group DXGW attachment, or on the DXGW itself.

Route-based VPNs use a BGP instance on the SDDC’s Edge gateway.  Therefore, all route-based VPNs connecting to the SDDC will use the same ASN on the SDDC side, and routes learned through one VPN connection will be advertised out to any other route-based VPNs.  It is not possible to adjust, inject or filter any BGP routes or parameters other than the ASN in the SDDC, and all network segments, as well as the complete management CIDR,  will be advertised to all route-based VPN connections.

For further details on the network connections, please refer to the VMware documentation site for VMware Cloud on AWS Networking & Security at https://docs.vmware.com/en/VMware-Cloud-on-AWS/services/com.vmware.vmc-aws.networking-security/GUID-0CD747E8-143D-476C-BE17-7DB991B32D37.html