VDS Best Practices - Rack Server Deployment with Eight 1 Gigabit adapters (Part 3 of 6)

Rack Server in Example Deployment

After looking at the major components in the example deployment and key virtual and physical switch parameters, let’s take a look at the different types of servers that customers can have in their environment. Customers deploy ESXi host either on a Rack Server or a Blade Server. This section discusses the deployment where the ESXi host is running on Rack server. Two types of Rack server configuration will be described in the following section

Rack Server with Eight 1 Gigabit Ethernet network adapters
Rack Server with Two 10 Gigabit Ethernet network adapters

For each of the above two configurations, the different VDS design approaches will be discussed.

Rack Server with Eight 1 Gigabit Ethernet network adapters

In a Rack Server deployment with eight 1Gigabit Ethernet network adapters per host, customers can either use traditional static design approach of allocating network adapters to each traffic type or make use of advanced features of VDS such as Network I/O Control (NIOC) and Load Based Teaming (LBT). The NIOC and LBT features help provide a dynamic design that utilizes I/O resources efficiently. In this section both the traditional and new design approaches are described along with their pros and cons.

Design Option 1 – Static configuration

This design option follows the traditional approach of statically allocating network resources to the different virtual infrastructure traffic types. As shown in the Figure 1, each host has eight Ethernet network adapters and four of those network adapters are connected to one of the first Access layer switches while the other four network adapters are connected to the second Access layer switch to avoid single point of failure. Let’s take a look in detail how VDS parameters are configured.

Figure 1 Rack Server with eight 1 Gigabit Ethernet network adapters

dvuplink configuration

To support the maximum eight 1 Gigabit Ethernet network adapters per host, the dvuplink port group is configured with eight dvuplinks (dvuplink1…. dvuplink8). On the hosts the dvuplink1 is associated with vmnic0 and dvuplink2 is associated with vmnic1… so on. It is a recommended practice to change the names of the dvuplinks to something meaningful and easy to track. For example, dvuplink1 that gets associated with vmnic on a motherboard can be renamed as “LOM-uplink1”.

If the hosts have some Ethernet network adapters as LAN On Motherboard (LOMs) and some on expansion cards, then for a better resiliency story, VMware recommends to select one network adapter from LOM and one from expansion card while configuring NIC teaming. To configure this teaming on a VDS, administrators have to pay attention to the dvuplink and vmnic association along with dvportgroup configuration where NIC teaming is enabled. In the NIC teaming configuration on a dvportgroup, administrators have to choose the different dvuplinks that are part of a team. If the dvuplinks are named appropriately according to the host vmnic association, administrators can select “LOM-uplink1” and “Expansion-uplink1” while configuring the teaming option for a dvportgroup.

dvportgroups configuration

As described in the Table 1 there are five different portgroups that are configured for the five different traffic types. Customers can create up to 5000 unique portgroups per VDS. In this example deployment, the decision of creating different portgroups is based on the number of traffic types.

According to the Table 1, dvportgroup PG-A is created for the management traffic type. There are other dvportgroups defined for the other traffic types. Following are the key configurations of dvportgroup PG-A:

Teaming Option: Explicit Failover order provides a deterministic way of directing traffic to a particular uplink. By selecting dvuplink1 as an Active uplink and dvuplink2 as standby uplink, the management traffic will be carried over dvuplink1 unless there is a failure on dvuplink1. Note that all other dvuplinks are configured as unused. It is also recommended to configure the failback option to “No” to avoid the flapping of traffic between two network adapters. The failback option determines how a physical adapter is returned to active duty after recovering from a failure. If failback is set to No, a failed adapter is left inactive even after recovery until another currently active adapter fails, requiring its replacement.
VMware recommends isolating all traffic types from each other by defining separate VLAN for each dvportgroup.
There are several other parameters that are part of the dvportgroup configuration. Customers can choose to configure these parameters based on their environment needs. For example, Customers can configure PVLAN to provide isolation when there are limited VLANs available in the environment.

As you follow the dvportgroups configuration in Table 1, you can see that each traffic type is carried over a specific dvuplink except the virtual machine traffic type that has two active uplinks dvuplink7 and dvuplink8. Virtual machine traffic type uses two active links, and these links are utilized through the load based teaming (LBT) algorithm. As mentioned earlier, LBT algorithm is much more efficient in utilizing link bandwidth than the standard hashing algorithm.

Table 1 Static Design configuration

Traffic Type	Port Group	Teaming Option	Active Uplink	Standby Uplink	Unused Uplink
Management	PG-A	Explicit Failover	dvuplink1	dvuplink2	3,4,5,6,7,8
vMotion	PG-B	Explicit Failover	dvuplink3	dvuplink4	1,2,5,6,7,8
FT	PG-C	Explicit Failover	dvuplink4	dvuplink3	1,2,5,6,7,8
iSCSI	PG-D	Explicit Failover	dvuplink5	dvuplink6	1,2,3,4,7,8
Virtual Machine	PG-E	LBT	dvuplink7/ dvuplink8	None	1,2,3,4,5,6

Physical switch configuration

The external physical switch, where the rack servers’ network adapters are connected to, is configured with trunk configuration with all the appropriate VLANs enabled. As described in the physical network switch parameters section, the following switch configurations are performed based on the VDS setup described in Table 1.

Enable STP on the trunk ports facing ESXI hosts along with “port fast” mode and “bpdu” guard.
The teaming configuration on VDS is static and thus no link aggregation is configured on the physical switches.
Because of the mesh topology deployment as shown in Figure 1, the link state-tracking feature is not required on the physical switches.

In this design approach, resiliency to the infrastructure traffic is achieved through Active – Standby uplinks and security is accomplished by providing separate physical paths for the different traffic types. However, with this design, the I/O resources are underutilized because the dvuplink2, dvuplink6 standby links are not used to send or receive traffic. Also, there is no flexibility to allocate more bandwidth to a traffic type when it needs it.

There is another variation to the static design approach that addresses some customer’s need of providing higher bandwidth to the storage and vMotion traffic type. In the static design described earlier, iSCSI and vMotion traffic is limited to 1 Gig. If a customer wants to support higher bandwidth for iSCSI, then they can make use of iSCSI multipathing solution. Also, with the release of vSphere 5, vMotion traffic can be carried over multiple Ethernet network adapters through the support of multi-NIC vMotion, and thus providing higher bandwidth to the vMotion process.

For more details on how to setup iSCSI multipathing please refer to the vSphere Storage guide link at the following website https://www.vmware.com/support/pubs/vsphere-esxi-vcenter-server-pubs.html. The configuration of multi-NIC vMotion is quite similar to the iSCSI multipath setup, where administrators have to create two separate vmkernel interfaces and bind each one to a separate dvportgroup. The two separate dvportgroup configuration provides the connectivity to two different Ethernet network adapters or dvuplinks.

Table 2 Static design configuration with iSCSI mutipathing and multi-NIC vMotion

Traffic Type	Port Group	Teaming Option	Active Uplink	Standby Uplink	Unused Uplink
Management	PG-A	Explicit Failover	dvuplink1	dvuplink2	3,4,5,6,7,8
vMotion	PG-B1	None	dvuplink3	dvuplink4	1,2,5,6,7,8
vMotion	PG-B2	None	dvuplink4	dvuplink3	1,2,5,6,7,8
FT	PG-C	Explicit Failover	dvuplink2	dvuplink1	1,2,5,6,7,8
iSCSI	PG-D1	None	dvuplink5	None	1,2,3,4,6,7,8
iSCSI	PG-D2	None	dvuplink6	None	1,2,3,4,6,7,8
Virtual Machine	PG-E	LBT	dvuplink7/ dvuplink8	None	1,2,3,4,5,6

As shown in Table 2, there are two entries each for vMotion and iSCSI traffic type listing the additional dvportgroups configuration required to support the multi-NIC vMotion and iSCSI multipathing processes. For multi-NIC vMotion the dvportgroup PG-B1 and PG-B2 are configured with dvuplink 3 and dvuplink4 as active links respectively. And for iSCSI multipathing the dvportgroups PG-D1 and PG-D2 are connected to dvuplink5 and dvuplink6 as active links respectively. The load balancing across the multiple dvuplinks is performed by the multipathing logic in iSCSI process and by the ESXi platform in vMotion process. It is not required to configure the teaming policies for these dvportgroups.

The FT, Management, and Virtual Machine traffic types dvportgroup configuration and physical switch configuration for this design remains same as described in the design option 1 in previous section.

This static design approach improves on the first design by using the advanced capabilities such as iSCSI multipathing and multi-NIC vMotion. But at the same time this option has the same challenges related to underutilized resources and inflexibility in allocating additional resources on the fly to different traffic types.

Design Option 2 – Dynamic configuration with NIOC and LBT

After looking at the traditional design approach with static uplink configurations, let’s take a look at the VMware recommended design option that takes advantage of the advanced VDS features such as NIOC and LBT.

In this design the connectivity to physical network infrastructure remains same as described in the static design option but instead of allocating specific dvuplinks to individual traffic types, the ESXi platform utilizes those dvuplinks dynamically. To illustrate this dynamic design, each virtual infrastructure traffic type’s bandwidth utilization is estimated. In a real deployment, customers should first monitor the virtual infrastructure traffic over a period of time to gauge the bandwidth utilization, and then come up with bandwidth numbers for each traffic type.

Following are some bandwidth numbers estimated per traffic type:

Management Traffic (< 1 Gig)
vMotion (1 Gig)
FT (1 Gig)
iSCSI (1 Gig)
Virtual Machine (2 Gig)

Based on this bandwidth information, administrators can provision appropriate I/O resources to each traffic types using the NIOC feature of VDS. Let’s take a look at the VDS parameter configurations for this design as well as the NIOC setup. The dvuplink portgroup configuration remains same where eight dvuplinks are created for the eight 1 Gigabit Ethernet network adapters. The dvportgroup configuration is as follows.

dvportgroups configuration

In this design all dvuplinks are active and there are no standby and unused uplinks as shown in Table 3. All dvuplinks are thus available for use by the teaming algorithm. Following are the key parameter configurations of dvportgroup PG-A:

Teaming Option: Load based teaming is selected as the teaming algorithm. With LBT configuration, the Management traffic initially will be scheduled based on the virtual port ID hash. Depending on the hash output, the management traffic is sent out over one of the dvuplink. Other traffic types in the virtual infrastructure can also be scheduled on the same dvuplink initially. However, when the utilization of the dvuplink goes beyond 75% threshold, the LBT algorithm will be invoked and some of the traffic will be moved to other underutilized dvuplinks. It is possible that Management traffic will be moved to other dvuplinks when such LBT event occurs.
The Failback option means going from using standby link to using active uplink after the active uplink comes back up in operation after a failure. This Failback option works when there are Active and Standby dvuplink configurations. In this design there are no Standby dvuplinks. So, when an active uplink fails, the traffic flowing on that dvuplink is moved to another working dvuplink. If the failed dvuplink comes back, the LBT algorithm will schedule new traffic on that dvuplink. This option is left default.
VMware recommends isolating all traffic types from each other by defining separate VLAN for each dvportgroup.
There are several other parameters that are part of the dvportgroup configuration. Customers can choose to configure these parameters based on their environment needs. For example, Customers can configure PVLAN to provide isolation when there are limited VLANs available in the environment.

As you follow the dvportgroups configuration in the Table 3, you can see that each traffic type has all dvuplinks active and these links are utilized through the load based teaming (LBT) algorithm. Let’s now look at the NIOC configuration described in the last two columns of Table 3.

The Network I/O Control (NIOC) configuration in this design helps provide the appropriate I/O resources to the different traffic types. Based on the previously estimated bandwidth numbers per traffic type, the shares parameter is configured in NIOC shares column in Table 3. The shares values specify the relative importance of specific traffic type, and NIOC ensures that during contention scenarios on the dvuplinks each traffic type gets the allocated bandwidth. For example, a shares configuration of 10 for vMotion, iSCSI, and FT allocates equal bandwidth to these traffic types. While the Virtual Machines get the highest bandwidth with 20 shares and Management gets lower bandwidth with 5 shares.

To illustrate how share values translate to bandwidth numbers, let’s take an example of 1 Gigabit capacity dvuplink carrying all five traffic types. This is a worst-case scenario where all traffic types are mapped to one dvuplink. This will never happen when customers enable the LBT feature, because LBT will balance the traffic based on the utilization of uplinks. This example shows how much bandwidth each traffic type will be allowed on one dvuplink during a contention or oversubscription scenario and when LBT is not enabled.

Management: 5 shares; (5/55) * 1 Gigabit = 90.91 Mbps
vMotion: 10 shares; (10/55) * 1 Gigabit = 181.18 Mbps
FT: 10 shares; (10/55) * 1 Gigabit = 181.18 Mbps
iSCSI: 10 shares; (10/55) * 1 Gigabit = 181.18 Mbps
Virtual Machine: 20 shares; (20/55) * 1 Gigabit = 363.64 Mbps
Total shares: 5 + 10 + 10 + 10 + 20 = 55

To calculate the bandwidth numbers during contention, you should first calculate the percentage of bandwidth for a traffic type by dividing its share value by the total available share number (55). In the second step the total bandwidth of the dvuplink (1 Gigabit) is multiplied with the percentage of bandwidth number calculated in the first step. For example, 5 shares allocated to management traffic translate to 90.91 Mbps of bandwidth to management process on a fully utilized 1 Gigabit network adapter. In this example, custom share configuration is discussed but customer can make use of predefined High (100), Normal (50), and Low (25) shares while assigning them to different traffic types.

The vSphere platform takes these configured share values and applies them per uplink. The schedulers running at each uplink are responsible in making sure that the bandwidth resources are allocated according to the shares. In case of eight 1Gigabit Ethernet network adapter deployment, there are eight schedulers running. Depending on the number of traffic types scheduled on a particular uplink, the scheduler will divide the bandwidth among the traffic types based on the share numbers. For example, if only FT (10 shares) and Management (5 shares) traffic are flowing through dvuplink 5, then based on the shares value, FT traffic will get double the bandwidth of management traffic. Also, when there is no management traffic flowing, all bandwidth can be utilized by FT process. This flexibility in allocating I/O resources is the key benefit of NIOC feature.

The NIOC limits parameter of Table 3 is not configured in this design. The Limit value specifies an absolute maximum limit on egress traffic for a traffic type. Limits are specified in Mbps. This configuration provides a hard limit on any traffic even if I/O resources are available to use. It is not recommended to use limit configuration unless you really want to control the traffic even though additional resources are available.

There is no change in physical switch configuration in this design approach even with the choice of the new LBT algorithm. The LBT teaming algorithm doesn’t require any special configuration on physical switches. Please refer to the physical switch settings described in design option 1.

Table 3 Dynamic design configuration with NIOC and LBT

Traffic Type	Port Group	Teaming Option	Active Uplink	Standby Uplink	NIOC Shares	NIOC Limits
Management	PG-A	LBT	1,2,3,4, 5,6,7,8	None	5	–
vMotion	PG-B	LBT	1,2,3,4, 5,6,7,8	None	10	–
FT	PG-C	LBT	1,2,3,4, 5,6,7,8	None	10	–
iSCSI	PG-D	LBT	1,2,3,4, 5,6,7,8	None	10	–
Virtual Machine	PG-E	LBT	1,2,3,4, 5,6,7,8	None	20	–

One thing to note about this design is that it doesn’t provide higher than 1 Gigabit bandwidth to the vMotion and iSCSI traffic types, as is the case in one of the static design using multi-NIC vMotion and iSCSI multipathing. The Load based teaming algorithm cannot split the infrastructure traffic across multiple dvuplink ports and utilize all the links. So, even if the vMotion dvportgroup PG-B has all the eight 1 Gigabit Ethernet network adapters as active uplinks, vMotion traffic will be carried over only one of the eight uplink. The main advantage of this design is in the scenarios where the vMotion process is not using the uplink bandwidth, and other traffic types are in need of the additional resources. In these situations NIOC makes sure that the unused bandwidth is allocated to the other traffic types that need it.

This dynamic design option is the recommended approach because it takes advantage of the advanced VDS features and utilizes I/O resource efficiently. This option also provides Active-Active resiliency where no uplinks are in standby mode. In this design approach, customers allow the vSphere platform to make the optimal decisions on scheduling traffic across multiple uplinks.

Some customers who have restrictions in the physical infrastructure in terms of bandwidth capacity across different paths and limited availability of the layer 2 domain might not be able to take advantage of this dynamic design option. While deploying this dynamic option, it is important to consider all different traffic paths that a traffic type can take and make sure that the physical switch infrastructure can support the specific characteristics required for each traffic type. VMware recommends that vSphere and Network administrators should work together to understand the impact of this dynamic traffic scheduling over physical network infrastructure before deploying this approach.

Every customer environment is different, and the requirements for the traffic types are also different. Depending on the need of the environment, customer can modify these design options to fit their specific requirements. For example, customers can choose to use combination of static and dynamic design option when they need higher bandwidth for iSCSI and vMotion activities. In this hybrid design four uplinks can be statically allocated to iSCSI and vMotion traffic types while remaining four uplinks can be used dynamically for the remaining traffic types. The Table 4 below shows the traffic types and associated port group configuration for the hybrid design.

Table 4 Hybrid design configuration

Traffic Type	Port Group	Teaming Option	Active Uplink	Standby Uplink	NIOC Shares	NIOC Limits
Management	PG-A	LBT	1,2,3,4	None	5	–
vMotion	PG-B1	None	5	6	–	–
vMotion	PG-B2	None	6	5	–	–
FT	PG-C	LBT	1,2,3,4	None	10	–
iSCSI	PG-D1	None	7	None	–	–
iSCSI	PG-D2	None	8	None	–
Virtual Machine	PG-E	LBT	1,2,3,4	None	20	–