Home > Blogs > VMware vSphere Blog > Tag Archives: nioc

Tag Archives: nioc

vSphere 5.1 – Network I/O Control (NIOC) Architecture – Old and New

Recently there has been some discussion around the egress traffic management feature of vSphere Distributed Switch (VDS) also called as Network I/O Control (NIOC). Thanks to my colleague Frank Denneman for providing more details about this feature on his blog site and bringing to my attention an architectural change in the vSphere 5.1 release. This change impacts how the Limit parameters are applied at the host level. In this post, I will first describe the old architecture of NIOC and then discuss the change. I will also talk about the impact of this change and what users need to keep in mind while configuring limit parameter.

Let’s first take a look at the NIOC components and architecture in the previous releases of vSphere. The diagram below shows a vSphere host with two 10 gig NICs, VDS components, NIOC configuration table, and different traffic types running on the host.

NIOC – Old Architecture

Continue reading

Using both Storage I/O Control & Network I/O Control for NFS

Many of these blog articles arise from conversations I have with folks both internally at VMware & externally in the community. This post is another such example. What I really like about this job is that it gets me thinking about a lot of stuff that I normally take for granted. The question this time was around using both Storage I/O Control (SIOC) & Network I/O Control (NIOC) for NFS traffic & Virtual Machines residing on NFS datastores, and could they possibly step on each others toes, so to speak.

The answer is no, the technologies are complementary. Let me try to explain how.

First off, let's have a brief overview of what the technologies do.

Intro to Storage I/O Control (SIOC)

SIOC was covered in a previous blog post. Details can be found here – http://blogs.vmware.com/vsphere/2011/09/storage-io-control-enhancements.html. In a nutshell, if SIOC detects that a pre-defined latency threshold for a particular datastore has been exceeded, it will throttle the amount of I/O a VM can queue to that datastore based on a 'shares' mechanism. When the contention is alleviated, SIOC will stop and VMs can then begin to use the datastore without any throttling. This avoids the 'noisy neighbor' problems when one VM can hog all the bandwidth to a shared datastore. The point to note here is that SIOC is working on a per VM basis, and deals with datastore objects.

SIOC was first introduced in vSphere 4.1, but only for block storage devices (FC, iSCSI, FCoE) only. In vSphere 5.0, we introduced SIOC support for NFS datastores.

Intro to Network I/O Control (NIOC)

There is a nice overview of NIOC on the networking blog here – http://blogs.vmware.com/networking/2010/07/got-network-io-control.html. Again, in a nutshell, NetIOC allows you to  define a guaranteed bandwidth for different vSphere network traffic types.

NIOC uses a software approach to partitioning physical network bandwidth among the different types of network traffic flows. For example, you can guarantee a minimum NFS bandwidth/latency when a vMotion operation is initiated on the same network & prevent the vMotion operation from having an impact on the NFS traffic flow. The point to note here is that NIOC is working on a network traffic stream, e.g. NFS, and deals with NIC ports.

SIOC & NIOC Together

Lets take a scenario where there are multiple VMs spread across multiple ESXi hosts, all sharing the same NFS datastore.

i) SIOC Use Case

For quite a while, we have been able to give bandwidth fairness to VMs running on the same host via the SFQ, the start-time fair queueing scheduler. This scheduler ensures share-based allocation of I/O resources between VMs on a per host basis. It is when we have VMs accessing the same datastore from different hosts that we've had to implement a distributed I/O scheduler. This is called PARDA, the Proportional Allocation of Resources for Distributed Storage Access. PARDA carves out the array queue amongst all the Virtual Machines which are sending I/O to the datastore on the array & adjusts the per host per datastore queue size depending on the sum of the per Virtual Machine shares on the host.

If SIOC is enabled on the datastore, and the latency threshold on the datastore is surpassed because of the amount of disk I/O that the VMs are generating on the datastore, the I/O bandwidth allocated to the VMs sharing the datastores will be adjusted according to the share values assigned to the VMs.

ii) NIOC Use Case

But what if something impacts the NFS traffic flow? In this case, VM performance may be impacted not because of an over-committed datastore, but due to there not being enough network bandwidth for the ESXi host to communicate with the NFS server. For instance, as mentioned in the beginning of the post, what if a vMotion operation was initiated (an operation which could consume up to 8Gbps of the network bandwidth), and impacted the other traffic on the same pipe, such as NFS? Yes, I know a best practice from VMware is to dedicate a NIC for vMotion traffic to avoid this exact situation, but this isn't always practical on 10Gb networks. In the case where vMotion, NFS and other traffic types are sharing the same uplink, NIOC allows us to guarantee a minimum bandwidth on a per traffic type. The really cool thing is that when there is no congestion, network traffic can use *all* the available bandwidth of the uplink. And just for clarification, the uplink is actually a dvuplink since NIOC can only be enabled on distributed switches. The feature is not available on stand-alone vSwitches.

Another important point to note which sometimes causes confusion: NFS traffic on the ESX host caused by a VM's disk I/O does not count towards that VM's portgroup bandwidth allocation should NIOC kick on. These are two distinct and separate network traffic streams/types, the former being NFS and the second being VM I/O.

Conclusion

There is no reason in my opinion not to use both SIOC and NIOC together. The technologies are complementary.

References

 

Get notification of these blogs postings and more VMware Storage information by following me on Twitter: Twitter @VMwareStorage

VDS Best Practices – Operational aspects (Part 6 of 6)

Operational best practices

After customers successfully design the virtual network infrastructure, the next challenge for the customer is how to deploy the design and how to keep the network operational. VMware provides various tools, APIs, and procedures to help customers deploy and manage their network infrastructure effectively. Following are some key tools that are available in the vSphere platform

  • CLI
  • vCenter API
  • Virtual Network Monitoring and Troubleshooting
    • NetFlow
    • Port Mirroring

In the following section, we will briefly discuss how vSphere and network administrators can utilize these tools to manage their virtual network. For more details on these tools please refer to the vSphere documentation.

Command Line Interface

vSphere administrators have several ways to access vSphere components through vSphere interface options that include vSphere Client, vSphere Web Client, and vSphere Command- Line interface. The vSphere CLI command set allows you to perform configuration tasks using a vCLI package installed on supported platforms, or using vMA. Please refer to Getting Started with vSphere CLI document for more details on the commands at the following link http://www.vmware.com/support/developer/vcli. The entire networking configuration can be performed through the CLI and thus helps administrators to automate the deployment process.

 

vCenter API

The networking setup in the virtualized data center involves configuration of virtual and physical switches. To automate this configuration process, VMware has provided APIs that allow network switch vendors to get information about the virtual infrastructure. This information regarding the virtual infrastructure helps network switch vendors in automating the configuration of the physical switches. For example, vCenter can trigger an event after the vMotion of a virtual machine is performed. After receiving this event trigger and related information, the network vendors can reconfigure the physical switch port policies such that when the VM moves to another host the VLANs/Access Control Lists (ACLs) configurations are also migrated along with the VM. Multiple networking vendors have provided this automation between physical and virtual infrastructure configuration through the integration with vCenter APIs. Customers should check with their networking vendors and find out if such automation tool exist that will bridge the gap between physical and virtual networking and simplify the operational challenges.

 

Virtual Network Monitoring and Troubleshooting

Monitoring and Troubleshooting network traffic in a virtual environment requires similar tools that are available in the physical switch environment. With the release of vSphere 5, VMware provides network administrators the ability to monitor and troubleshoot the virtual infrastructure through the features such as NetFlow and Port Mirroring.

NetFlow capability on a Distributed Switch along with a NetFlow collector tool helps monitor application flows and measures flow performance over time. It also helps in capacity planning and ensuring that I/O resources are utilized properly by different applications, based on their needs.

The port mirroring capability on a Distributed Switch is a valuable tool that helps network administrators in debugging network issues in a virtual infrastructure. The granular control over monitoring ingress, egress or all traffic of a port helps administrators fine-tune what traffic is sent for analysis.

Conclusion

vSphere Distributed Switch provides customers the right amount of features, capabilities and operational simplicity for deploying the virtual network infrastructure. As customers move on to build private or public cloud, VDS provides the scalability numbers for such deployments. The advanced capabilities such as NIOC and LBT are key for achieving better utilization of I/O resources and for providing better SLAs for the virtualized business critical applications and multi-tenant deployments. The support for standard networking visibility and monitoring features such as Port mirror and NetFlow help administrators manage and troubleshoot virtual infrastructure through familiar tools. VDS also is an extensible platform that allows integration with other networking vendor products through the open vCenter APIs.

This is the final entry in the series of VDS best practices blog. I would love to get your inputs on all the discussed VDS design options. As I mentioned earlier, customers are not limited to use the discussed design options. Depending on the needs and available infrastructure, customers can either tweak these deign options or come up with a new design for their deployments. Thanks for reading through these long posts.

 

VDS Best Practices – Blade Server Deployments (Part 5 of 6)

Blade Server in Example Deployment

Blade servers are server platforms that provide higher server consolidation per rack unit along with benefits of lower power and cooling costs. Blade chassis that hosts the blade servers have proprietary architectures and each vendor has its own way of managing resources in the blade chassis. It is difficult to talk about all different blade chassis available in the market and explain their deployments in this document. In this section, we will focus on some generic parameters that customers should consider while deploying VDS in a blade chassis environment.

From the networking point of view all blade chassis provide the following two options:

  • Integrated switches: In this option, the blade chassis allows built-in switches to control traffic flow between blade servers within the chassis and external network.
  • Pass – Through Technology: This is an alternative method of network connectivity that allows the individual blade servers to communicate directly with external network.

In this design, the integrated switch option is described where the blade chassis has a built-in Ethernet switch. This Ethernet switch acts as an Access layer switch as shown in Figure 1.

This section discusses the deployment where the ESXi host is running on Blade server. Two types of Blade server configuration will be described in the following section

  • Blade Server with Two 10 Gigabit Ethernet network adapters
  • Blade Server with hardware assisted multiple logical network adapters

For each of the above two configurations, the different VDS design approaches will be discussed.

 

Blade Server with Two 10 Gigabit Ethernet network adapters

This deployment is quite similar to the Rack Server with two 10 Gigabit Ethernet network adapters deployment where each ESXi host was presented with two 10 Gigabit network adapters. As shown in Figure 1, the ESXi host running on a blade server in the blade chassis is also presented with two 10 Gigabit Ethernet network adapters.

2x10gig_blade_deployment

Figure 1 Blade Server with 2- 10 Gig NICs

In this section two design options are described; one is a traditional static approach and other one is a VMware recommended dynamic configuration with NIOC and LBT features enabled. These two approaches are exactly similar to the deployment described under Rack Server with two 10 Gigabit network adapter section. Only blade chassis specific design decisions will be discussed as part of this section. For all other VDS and switch related configuration, readers are encouraged to refer the Rack sever with two 10 Gigabit network adapter section of this document.

Design Option 1 – Static Configuration

The configuration of this design approach is exactly same as described in the Design option1 section under Rack server with two 10 Gigabit network adapters. Please refer to the Table 1 below for the dvportgroup configuration details. Let’s take a look at the blade server specific parameters that need attention during the design.

Table 1 Static design configuration

Traffic Type

Port Group

Teaming Option

Active Uplink

Standby Uplink

Unused Uplink

Management

PG-A

Explicit Failover

dvuplink1

dvuplink2

None

vMotion

PG-B

Explicit Failover

dvuplink2

dvuplink1

None

FT

PG-C

Explicit Failover

dvuplink2

dvuplink1

None

iSCSI

PG-D

Explicit Failover

dvuplink1

dvuplink2

None

Virtual Machine

PG-E

LBT

dvuplink1/

dvuplink2

None

None

 

The network and hardware reliability considerations should be incorporated during the blade server design as well. In these blade server designs, customers have to focus on the following two areas:

  • High availability of blade switches in the blade chassis
  • Connectivity of the blade server network adapters to internal blade switches.

High availability of blade switches can be achieved by having two Ethernet switching modules in the blade chassis. And the connectivity of two network adapters on the blade server should be such that one network adapter is connected to first Ethernet switch module and other network adapter is hooked to the second switch module in the blade chassis.

Another aspect that needs attention in the blade server deployment is the network bandwidth availability across the mid plane of blade chassis and between blade switches and aggregation layer. If there is oversubscription scenario in the deployment then customers have to think about utilizing traffic shaping and prioritization (802.1p tagging) features available in the vSphere platform. The prioritization feature allows customer to tag the important traffic coming out of the vSphere platform. These high priority tagged packets are then treated according to priority by the external switch infrastructure. During congestion scenarios, the switch will drop lower priority packets first and avoid dropping the important high priority packets.

This static design option provides the flexibility to the customers of choosing different network adapters for different traffic types. However, while doing the traffic allocation on limited two 10 Gigabit network adapters administrators end up scheduling multiple traffic types on a single adapter. As multiple traffic types flow through one adapter, the chances of one traffic dominating others goes up. To avoid the performance impact due to the noisy neighbors (dominating traffic type), customers have to utilize the traffic management tools provided in the vSphere platform. One of the traffic management features is NIOC, and that feature is utilized in the design option 2 described below.

 

Design Option 2 – Dynamic Configuration with NIOC and LBT

This Dynamic configuration approach is exactly same as described in the Design option2 section under Rack server with two 10 Gigabit Ethernet network adapters. Please refer to the Table 2 below  the dvportgroup configuration details and NIOC settings. The physical switch related configuration in the blade chassis deployment is the same as described in the rack server deployment. For the blade center specific recommendation on reliability and traffic management please refer to previous section.

Table 2 Dynamic design configuration

Traffic Type

Port Group

Teaming Option

Active Uplink

Standby Uplink

NIOC Shares

NIOC Limits

Management

PG-A

LBT

dvuplink1, 2

None

5

-

vMotion

PG-B

LBT

dvuplink1, 2

None

20

-

FT

PG-C

LBT

dvuplink1, 2

None

10

-

iSCSI

PG-D

LBT

dvuplink1, 2

None

20

-

Virtual Machine

PG-E

LBT

dvuplink1, 2

None

20

-

 

VMware recommends this design option that utilizes the advanced VDS features and provides customer with a dynamic and flexible design approach. In this design, I/O resources are utilized effectively and Service Level Agreements are met based on the shares allocation.

Blade Server with Hardware assisted Logical network adapters (HP-Flex10 like deployment)

Some of the new blade chassis supports traffic management capabilities that allow customers to carve I/O resources. This is achieved by presenting logical network adapters to the ESXi hosts. Instead of two 10 Gigabit Ethernet network adapters, ESXi host now sees multiple physical network adapters that operate at different configurable speeds. As shown in Figure 2, each ESXi host is presented with eight Ethernet network adapters that are carved out of two 10 Gigabit Ethernet network adapter.

8gig_blade_deployment
Figure 2 Multiple Logical network adapters

This deployment is quite similar to the Rack server with eight 1 Gigabit Ethernet network adapter deployment. However, instead of 1 Gigabit network adapters the capacity of each network adapter is configured at the blade chassis level. In the blade chassis, customers can carve out different capacity network adapters based on the need of each traffic types. For example, if iSCSI traffic needs 2.5 Gigabit of bandwidth, a logical network adapter with that amount of I/O resources can be created on the blade chassis and presented to the blade server.

As for the configuration of the virtual switch VDS and blade chassis switch infrastructure goes, the configuration described in the design option 1 under the Rack server with eight 1 Gigabit network adapters is more relevant for this deployment. The static configuration option described in that design can be applied as is in this blade server environment. Please refer to Table 3 for dvportgroup configuration details and the switch configurations descried in that section for physical switch configuration details.

Table 3 Static Design configuration

Traffic Type

Port Group

Teaming Option

Active Uplink

Standby Uplink

Unused Uplink

Management

PG-A

Explicit Failover

dvuplink1

dvuplink2

3,4,5,6,7,8

vMotion

PG-B

Explicit Failover

dvuplink3

dvuplink4

1,2,5,6,7,8

FT

PG-C

Explicit Failover

dvuplink4

dvuplink3

1,2,5,6,7,8

iSCSI

PG-D

Explicit Failover

dvuplink5

dvuplink6

1,2,3,4,7,8

Virtual Machine

PG-E

LBT

dvuplink7/

dvuplink8

None

1,2,3,4,5,6

 

Now the question is whether NIOC capability adds any value in this specific blade server deployment. NIOC is a traffic management feature that helps in scenarios where multiple traffic types flow through one uplink or network adapter. If in this particular deployment only one traffic type is assigned to a specific Ethernet network adapter then the NIOC feature will not add any value. However, if multiple traffic types are scheduled over one network adapter then customers can make use of NIOC to assign appropriate shares to different traffic types. This NIOC configuration will make sure that the bandwidth resources are allocated to the traffic types and SLA is met.

To illustrate this through an example, Let’s consider a scenario where vMotion and iSCSI traffic is carried over one 3 Gigabit logical uplink. To protect the iSCSI traffic from network intensive vMotion traffic, administrators can configure NIOC and allocate shares to each traffic type. If both traffics are equally important then administrators can configure shares with equal values (10 each). With this configuration, when there is a contention scenario, NIOC will make sure that iSCSI process will get half of the 1Gigabit uplink bandwidth and avoid any impact of vMotion process.

VMware recommends that the Network and Server administrators work closely together while deploying the traffic management features of the VDS and Blade Chassis. A lot of co-ordination is required during the configuration of the traffic management features to achieve the best end-to-end QoS result.

This concludes the different design options for the Rack and Blade server deployments with different network adapter configurations. Would love to get your feedback on these different design options and design guidelines. In the next blog entry I will talk about some operational aspect of VDS. Please stay tuned. 

VDS Best Practices – Rack Server Deployment with Two 10 Gigabit adapters (Part 4 of 6)

Rack Server with Two 10 Gigabit Ethernet network adapters

The two 10 Gigabit Ethernet network adapters deployment model is becoming very common because of the benefits they provide through I/O consolidation. The key benefits include better utilization of I/O resources, simplified management, and reduced CAPEX and OPEX. While this deployment provides these benefits, there are some challenges when it comes to the traffic management aspects. Specially, in highly consolidated virtualized environments where more traffic types are carried over fewer 10 Gigabit Ethernet network adapters, and it becomes critical to prioritize traffic types that are important and provide the required SLA guarantees. The NIOC feature available on the VDS helps in this traffic management activity. In the following sections you will see how to utilize this feature in the different designs.

As shown in Figure 1, the rack servers with two 10 Gigabit Ethernet network adapters are connected to the two access layer switches to avoid any single point of failure. Similar to the Rack server with eight 1 Gigabit Ethernet network adapters section, the different VDS and Physical switch parameter configurations are taken into account during this design. On the physical switch side, the new 10 Gigabit switches might have support for FCoE that allows convergence for SAN and LAN traffic. This document only covers the standard 10 Gigabit deployments that support IP storage traffic (iSCSI/NFS) and not FCoE.

In this section two design options are described; one is a traditional approach and other one is a VMware recommended approach.

2x10gig_deployment
Figure 1 Rack server with 2 – 10 Gig NICs

Design Option 1 – Static Configuration

The static configuration approach for rack server deployment with 10 Gigabit Ethernet network adapters is similar to the one described in the design option 1 of rack server deployment with eight 1 Gigabit Ethernet adapters. There are few differences in the configuration where the numbers of dvuplinks are changed from eight to two, and dvportgroup parameters are different. Let’s take a look at the configuration details on the VDS front.

dvuplink configuration

To support the maximum two Ethernet network adapters per host, the dvuplink port group is configured with 2 dvuplinks (dvuplink1, dvuplink2). On the hosts the dvuplink1 is associated with vmnic0 and dvuplink2 is associated with vmnic1.

 dvportgroups configuration

As described in the Table 1, there are five different dvportgroups that are configured for the five different traffic types. For example, dvportgroup PG-A is created for the management traffic type. Following are the other key configurations of dvportgroup PG-A:

  • Teaming Option: Explicit Failover order provides a deterministic way of directing traffic to a particular uplink. By selecting dvuplink1 as an Active uplink and dvuplink2 as standby uplink the management traffic will be carried over dvuplink1 unless there is a failure of dvuplink1. It is also recommended to configure the failback option to “No” to avoid the flapping of traffic between two NICs. The failback option determines how a physical adapter is returned to active duty after recovering from a failure. If failback is set to No, a failed adapter is left inactive even after recovery until another currently active adapter fails, requiring its replacement.
  • VMware recommends isolating all traffic types from each other by defining separate VLAN for each dvportgroup.
  • There are various other parameters that are part of the dvportgroup configuration. Customers can choose to configure these parameters based on their environment needs.

Table 1 below provides the configuration details for all the dvportgroups. According to the configuration, dvuplink1 carries Management, iSCSI, and Virtual Machine traffic while dvuplink2 handles the vMotion, FT, and Virtual Machine traffic. As you can see, Virtual machine traffic type makes use of two uplinks, and these uplinks are utilized through the load based teaming (LBT) algorithm.

In this deterministic teaming policy, customers can decide to map different traffic types to the available uplink ports depending on the environment needs. For example, if iSCSI traffic needs higher bandwidth and other traffic types have relatively low bandwidth requirements, then customers can decide to keep only iSCSI traffic on dvuplink1 and move all other traffic to dvuplink2. When deciding on these traffic paths, customers should understand the physical network connectivity and the paths bandwidth capacity.

Physical switch configuration

The external physical switch, where the rack servers’ network adapters are connected to, is configured with trunk configuration with all the appropriate VLANs enabled. As described in the physical network switch parameters sections, following switch configurations are performed based on the VDS setup described in Table 1.

  • Enable STP on the trunk ports facing ESXi hosts along with “port fast” mode and “bpdu” guard.
  • The teaming configuration on VDS is static and thus no link aggregation is configured on the physical switches.
  • Because of the mesh topology deployment as shown in Figure 1, the link state-tracking feature is not required on the physical switches.

 Table 1 Static design configuration

Traffic Type

Port Group

Teaming Option

Active Uplink

Standby Uplink

Unused Uplink

Management

PG-A

Explicit Failover

dvuplink1

dvuplink2

None

vMotion

PG-B

Explicit Failover

dvuplink2

dvuplink1

None

FT

PG-C

Explicit Failover

dvuplink2

dvuplink1

None

iSCSI

PG-D

Explicit Failover

dvuplink1

dvuplink2

None

Virtual Machine

PG-E

LBT

dvuplink1/

dvuplink2

None

None

 

This static design option provides the flexibility in the traffic path configuration but it cannot protect against one traffic type dominating others. For example, there is a possibility that network intensive vMotion process can take away most of the network bandwidth and impact virtual machine traffic. Bi-directional traffic shaping parameters at portgroup and port level can provide some help in managing different traffic rates. However, using this approach for traffic management requires customers to limit the traffic on the respective dvportgroups. Limiting traffic to a certain level through this method puts a hard limit on the traffic types even when the bandwidth is available to utilize. This underutilization of I/O resources because of hard limits is overcome through the NIOC feature, which provides flexible traffic management based on shares parameter. The design option 2 described below is based on the NIOC feature.

 

Design Option 2 – Dynamic Configuration with NIOC and LBT

This dynamic design option is the VMware recommended approach that takes advantage of the NIOC and LBT features of the VDS.

The connectivity to physical network infrastructure remains same as described in the design option 1. However, instead of allocating specific dvuplinks to individual traffic types, the ESXi platform utilizes those dvuplinks dynamically. To illustrate this dynamic design, each virtual infrastructure traffic type’s bandwidth utilization is estimated. In a real deployment, customers should first monitor the virtual infrastructure traffic over a period of time to gauge the bandwidth utilization, and then come up with bandwidth numbers.

Following are some bandwidth numbers estimated per traffic type:

  • Management Traffic (< 1 Gig)
  • vMotion (2 Gig)
  • FT (1 Gig)
  • iSCSI (2 Gig)
  • Virtual Machine (2 Gig)

These bandwidth estimates are different from the one considered with rack server deployment with eight 1 Gig network adapters. Let’s take a look at the VDS parameter configurations for this design. The dvuplink portgroup configuration remains same with two dvuplinks created for the two 10 Gigabit Ethernet network adapters. The dvportgroup configuration is as follows.

dvportgroups configuration

In this design all dvuplinks are active and there are no standby and unused uplinks as shown in Table 2.  All dvuplinks are thus available for use by the teaming algorithm. Following are the key configurations of dvportgroup PG-A:

  • Teaming Option: Load based teaming is selected as the teaming algorithm. With LBT configuration, the Management traffic initially will be scheduled based on the virtual port ID hash. Based on the hash output the management traffic will be sent out over one of the dvuplink. Other traffic types in the virtual infrastructure can also be scheduled on the same dvuplink With LBT configuration. Subsequently, if the utilization of the uplink goes beyond 75% threshold, LBT algorithm will be invoked and some of the traffic will be moved to other underutilized dvuplinks. It is possible that Management traffic will get moved to other dvuplinks when such event occurs.
  • There are no standby dvuplinks in this configuration so the failback setting is not applicable for this design approach. The default setting for this failback option is “Yes”.
  • VMware recommends isolating all traffic types from each other by defining separate VLAN for each dvportgroup.
  • There are several other parameters that are part of the dvportgroup configuration. Customers can choose to configure these parameters based on their environment needs.

As you follow the dvportgroups configuration in Table 2, you can see that each traffic type has all the dvuplinks as active and these uplinks are utilized through the load based teaming (LBT) algorithm. Let’s take a look at the NIOC configuration.

The Network I/O Control (NIOC) configuration in this design not only helps provide the appropriate I/O resources to the different traffic types but also provides SLA guarantees by protecting from one traffic type dominating others.

Based on the bandwidth assumptions made for different traffic types, the shares parameters are configured in NIOC shares column in Table 2. To illustrate how share values translate to bandwidth numbers in this deployment, let’s take an example of 10 Gigabit capacity dvuplink carrying all five traffic types. This is a worst-case scenario where all traffic types are mapped to one dvuplink. This will never happen when customers enable the LBT feature, because the LBT will move traffic type based on the uplink utilization. This example shows how much bandwidth each traffic type will be allowed on one dvuplink during a contention or oversubscription scenario and when LBT is not enabled

  • Management: 5 shares;        (5/75) * 10 Gigabit = 667 Mbps
  • vMotion: 20 shares;               (20/75) * 10 Gigabit = 2.67 Gbps
  • FT: 10 shares;                          (10/75) * 10 Gigabit = 1.33 Gbps
  • iSCSI: 20 shares;                      (20/75) * 10 Gigabit = 2.67 Gbps
  • Virtual Machine: 20 shares; (20/75) * 10 Gigabit = 2.67 Gbps
  • Total shares: 5 + 20 + 10 + 20 + 20 = 75

As you can see, for each traffic type first the percentage of bandwidth is calculated by dividing the share value by the total available share number (75), and then the total bandwidth of the dvuplink (10 Gigabit) is used to calculate the bandwidth share for the traffic type. For example, 20 shares allocated to vMotion traffic translate to 2.67 Gbps of bandwidth to vMotion process on a fully utilized 10 Gigabit network adapter.

In this 10 Gigabit Ethernet deployment, customers can provide bigger pipes to individual traffic types without the use of trunking or multipathing technologies. This was not the case with eight 1 Gigabit Ethernet deployment.

There is no change in physical switch configuration in this design approach. So please refer to the physical switch settings described in design option 1 in previous section.

Table 2 Dynamic design configuration

Traffic Type

Port Group

Teaming Option

Active Uplink

Standby Uplink

NIOC Shares

NIOC Limits

Management

PG-A

LBT

dvuplink1, 2

None

5

-

vMotion

PG-B

LBT

dvuplink1, 2

None

20

-

FT

PG-C

LBT

dvuplink1, 2

None

10

-

iSCSI

PG-D

LBT

dvuplink1, 2

None

20

-

Virtual Machine

PG-E

LBT

dvuplink1, 2

None

20

-

 

This design option utilizes the advanced VDS features and provides customer with a dynamic and flexible design approach. In this design I/O resources are utilized effectively and Service Level Agreements are met based on the shares allocation.

In the next blog entry I will talk about the Blade center deployments.

 

VDS Best Practices – Rack Server Deployment with Eight 1 Gigabit adapters (Part 3 of 6)

Rack Server in Example Deployment

After looking at the major components in the example deployment and key virtual and physical switch parameters, let’s take a look at the different types of servers that customers can have in their environment. Customers deploy ESXi host either on a Rack Server or a Blade Server. This section discusses the deployment where the ESXi host is running on Rack server. Two types of Rack server configuration will be described in the following section

  • Rack Server with Eight 1 Gigabit Ethernet network adapters
  • Rack Server with Two 10 Gigabit Ethernet network adapters

For each of the above two configurations, the different VDS design approaches will be discussed.

Rack Server with Eight 1 Gigabit Ethernet network adapters

In a Rack Server deployment with eight 1Gigabit Ethernet network adapters per host, customers can either use traditional static design approach of allocating network adapters to each traffic type or make use of advanced features of VDS such as Network I/O Control (NIOC) and Load Based Teaming (LBT). The NIOC and LBT features help provide a dynamic design that utilizes I/O resources efficiently. In this section both the traditional and new design approaches are described along with their pros and cons.

 

Design Option 1 – Static configuration

This design option follows the traditional approach of statically allocating network resources to the different virtual infrastructure traffic types. As shown in the Figure 1, each host has eight Ethernet network adapters and four of those network adapters are connected to one of the first Access layer switches while the other four network adapters are connected to the second Access layer switch to avoid single point of failure. Let’s take a look in detail how VDS parameters are configured.

8gig_deployment

Figure 1 Rack Server with eight 1 Gigabit Ethernet network adapters

dvuplink configuration

To support the maximum eight 1 Gigabit Ethernet network adapters per host, the dvuplink port group is configured with eight dvuplinks (dvuplink1…. dvuplink8). On the hosts the dvuplink1 is associated with vmnic0 and dvuplink2 is associated with vmnic1… so on. It is a recommended practice to change the names of the dvuplinks to something meaningful and easy to track. For example, dvuplink1 that gets associated with vmnic on a motherboard can be renamed as “LOM-uplink1”.

If the hosts have some Ethernet network adapters as LAN On Motherboard (LOMs) and some on expansion cards, then for a better resiliency story, VMware recommends to select one network adapter from LOM and one from expansion card while configuring NIC teaming. To configure this teaming on a VDS, administrators have to pay attention to the dvuplink and vmnic association along with dvportgroup configuration where NIC teaming is enabled. In the NIC teaming configuration on a dvportgroup, administrators have to choose the different dvuplinks that are part of a team. If the dvuplinks are named appropriately according to the host vmnic association, administrators can select “LOM-uplink1” and “Expansion-uplink1” while configuring the teaming option for a dvportgroup.

 dvportgroups configuration

As described in the Table 1 there are five different portgroups that are configured for the five different traffic types. Customers can create up to 5000 unique portgroups per VDS. In this example deployment, the decision of creating different portgroups is based on the number of traffic types.

According to the Table 1, dvportgroup PG-A is created for the management traffic type. There are other dvportgroups defined for the other traffic types. Following are the key configurations of dvportgroup PG-A:

  • Teaming Option: Explicit Failover order provides a deterministic way of directing traffic to a particular uplink. By selecting dvuplink1 as an Active uplink and dvuplink2 as standby uplink, the management traffic will be carried over dvuplink1 unless there is a failure on dvuplink1. Note that all other dvuplinks are configured as unused. It is also recommended to configure the failback option to “No” to avoid the flapping of traffic between two network adapters. The failback option determines how a physical adapter is returned to active duty after recovering from a failure. If failback is set to No, a failed adapter is left inactive even after recovery until another currently active adapter fails, requiring its replacement.
  • VMware recommends isolating all traffic types from each other by defining separate VLAN for each dvportgroup.
  • There are several other parameters that are part of the dvportgroup configuration. Customers can choose to configure these parameters based on their environment needs. For example, Customers can configure PVLAN to provide isolation when there are limited VLANs available in the environment.

As you follow the dvportgroups configuration in Table 1, you can see that each traffic type is carried over a specific dvuplink except the virtual machine traffic type that has two active uplinks dvuplink7 and dvuplink8. Virtual machine traffic type uses two active links, and these links are utilized through the load based teaming (LBT) algorithm. As mentioned earlier, LBT algorithm is much more efficient in utilizing link bandwidth than the standard hashing algorithm.

Table 1 Static Design configuration

Traffic Type

Port Group

Teaming Option

Active Uplink

Standby Uplink

Unused Uplink

Management

PG-A

Explicit Failover

dvuplink1

dvuplink2

3,4,5,6,7,8

vMotion

PG-B

Explicit Failover

dvuplink3

dvuplink4

1,2,5,6,7,8

FT

PG-C

Explicit Failover

dvuplink4

dvuplink3

1,2,5,6,7,8

iSCSI

PG-D

Explicit Failover

dvuplink5

dvuplink6

1,2,3,4,7,8

Virtual Machine

PG-E

LBT

dvuplink7/

dvuplink8

None

1,2,3,4,5,6

 

Physical switch configuration

The external physical switch, where the rack servers’ network adapters are connected to, is configured with trunk configuration with all the appropriate VLANs enabled. As described in the physical network switch parameters section, the following switch configurations are performed based on the VDS setup described in Table 1.

  • Enable STP on the trunk ports facing ESXI hosts along with “port fast” mode and “bpdu” guard.
  • The teaming configuration on VDS is static and thus no link aggregation is configured on the physical switches.
  • Because of the mesh topology deployment as shown in Figure 1, the link state-tracking feature is not required on the physical switches.

In this design approach, resiliency to the infrastructure traffic is achieved through Active – Standby uplinks and security is accomplished by providing separate physical paths for the different traffic types. However, with this design, the I/O resources are underutilized because the dvuplink2, dvuplink6 standby links are not used to send or receive traffic. Also, there is no flexibility to allocate more bandwidth to a traffic type when it needs it.

There is another variation to the static design approach that addresses some customer’s need of providing higher bandwidth to the storage and vMotion traffic type. In the static design described earlier, iSCSI and vMotion traffic is limited to 1 Gig. If a customer wants to support higher bandwidth for iSCSI, then they can make use of iSCSI multipathing solution. Also, with the release of vSphere 5, vMotion traffic can be carried over multiple Ethernet network adapters through the support of multi-NIC vMotion, and thus providing higher bandwidth to the vMotion process.

For more details on how to setup iSCSI multipathing please refer to the vSphere Storage guide link at the following website https://www.vmware.com/support/pubs/vsphere-esxi-vcenter-server-pubs.html. The configuration of multi-NIC vMotion is quite similar to the iSCSI multipath setup, where administrators have to create two separate vmkernel interfaces and bind each one to a separate dvportgroup. The two separate dvportgroup configuration provides the connectivity to two different Ethernet network adapters or dvuplinks.

Table 2 Static design configuration with iSCSI mutipathing and multi-NIC vMotion

Traffic Type

Port Group

Teaming Option

Active Uplink

Standby Uplink

Unused Uplink

Management

PG-A

Explicit Failover

dvuplink1

dvuplink2

3,4,5,6,7,8

vMotion

PG-B1

None

dvuplink3

dvuplink4

1,2,5,6,7,8

vMotion

PG-B2

None

dvuplink4

dvuplink3

1,2,5,6,7,8

FT

PG-C

Explicit Failover

dvuplink2

dvuplink1

1,2,5,6,7,8

iSCSI

PG-D1

None

dvuplink5

None

1,2,3,4,6,7,8

iSCSI

PG-D2

None

dvuplink6

None

1,2,3,4,6,7,8

Virtual Machine

PG-E

LBT

dvuplink7/

dvuplink8

None

1,2,3,4,5,6

 

As shown in Table 2, there are two entries each for vMotion and iSCSI traffic type listing the additional dvportgroups configuration required to support the multi-NIC vMotion and iSCSI multipathing processes. For multi-NIC vMotion the dvportgroup PG-B1 and PG-B2 are configured with dvuplink 3 and dvuplink4 as active links respectively. And for iSCSI multipathing the dvportgroups PG-D1 and PG-D2 are connected to dvuplink5 and dvuplink6 as active links respectively. The load balancing across the multiple dvuplinks is performed by the multipathing logic in iSCSI process and by the ESXi platform in vMotion process. It is not required to configure the teaming policies for these dvportgroups.

The FT, Management, and Virtual Machine traffic types dvportgroup configuration and physical switch configuration for this design remains same as described in the design option 1 in previous section.

This static design approach improves on the first design by using the advanced capabilities such as iSCSI multipathing and multi-NIC vMotion. But at the same time this option has the same challenges related to underutilized resources and inflexibility in allocating additional resources on the fly to different traffic types.

Design Option 2 – Dynamic configuration with NIOC and LBT

After looking at the traditional design approach with static uplink configurations, let’s take a look at the VMware recommended design option that takes advantage of the advanced VDS features such as NIOC and LBT.

In this design the connectivity to physical network infrastructure remains same as described in the static design option but instead of allocating specific dvuplinks to individual traffic types, the ESXi platform utilizes those dvuplinks dynamically. To illustrate this dynamic design, each virtual infrastructure traffic type’s bandwidth utilization is estimated. In a real deployment, customers should first monitor the virtual infrastructure traffic over a period of time to gauge the bandwidth utilization, and then come up with bandwidth numbers for each traffic type.

Following are some bandwidth numbers estimated per traffic type:

  • Management Traffic (< 1 Gig)
  • vMotion (1 Gig)
  • FT (1 Gig)
  • iSCSI (1 Gig)
  • Virtual Machine (2 Gig)

Based on this bandwidth information, administrators can provision appropriate I/O resources to each traffic types using the NIOC feature of VDS. Let’s take a look at the VDS parameter configurations for this design as well as the NIOC setup. The dvuplink portgroup configuration remains same where eight dvuplinks are created for the eight 1 Gigabit Ethernet network adapters. The dvportgroup configuration is as follows.

dvportgroups configuration

In this design all dvuplinks are active and there are no standby and unused uplinks as shown in Table 3.  All dvuplinks are thus available for use by the teaming algorithm. Following are the key parameter configurations of dvportgroup PG-A:

  • Teaming Option: Load based teaming is selected as the teaming algorithm. With LBT configuration, the Management traffic initially will be scheduled based on the virtual port ID hash. Depending on the hash output, the management traffic is sent out over one of the dvuplink. Other traffic types in the virtual infrastructure can also be scheduled on the same dvuplink initially. However, when the utilization of the dvuplink goes beyond 75% threshold, the LBT algorithm will be invoked and some of the traffic will be moved to other underutilized dvuplinks. It is possible that Management traffic will be moved to other dvuplinks when such LBT event occurs.
  • The Failback option means going from using standby link to using active uplink after the active uplink comes back up in operation after a failure. This Failback option works when there are Active and Standby dvuplink configurations. In this design there are no Standby dvuplinks. So, when an active uplink fails, the traffic flowing on that dvuplink is moved to another working dvuplink. If the failed dvuplink comes back, the LBT algorithm will schedule new traffic on that dvuplink. This option is left default.
  • VMware recommends isolating all traffic types from each other by defining separate VLAN for each dvportgroup.   
  • There are several other parameters that are part of the dvportgroup configuration. Customers can choose to configure these parameters based on their environment needs. For example, Customers can configure PVLAN to provide isolation when there are limited VLANs available in the environment.

As you follow the dvportgroups configuration in the Table 3, you can see that each traffic type has all dvuplinks active and these links are utilized through the load based teaming (LBT) algorithm. Let’s now look at the NIOC configuration described in the last two columns of Table 3.

The Network I/O Control (NIOC) configuration in this design helps provide the appropriate I/O resources to the different traffic types. Based on the previously estimated bandwidth numbers per traffic type, the shares parameter is configured in NIOC shares column in Table 3. The shares values specify the relative importance of specific traffic type, and NIOC ensures that during contention scenarios on the dvuplinks each traffic type gets the allocated bandwidth. For example, a shares configuration of 10 for vMotion, iSCSI, and FT allocates equal bandwidth to these traffic types. While the Virtual Machines get the highest bandwidth with 20 shares and Management gets lower bandwidth with 5 shares.

To illustrate how share values translate to bandwidth numbers, let’s take an example of 1 Gigabit capacity dvuplink carrying all five traffic types. This is a worst-case scenario where all traffic types are mapped to one dvuplink. This will never happen when customers enable the LBT feature, because LBT will balance the traffic based on the utilization of uplinks. This example shows how much bandwidth each traffic type will be allowed on one dvuplink during a contention or oversubscription scenario and when LBT is not enabled.

  • Management: 5 shares;        (5/55) * 1 Gigabit = 90.91 Mbps
  • vMotion: 10 shares;               (10/55) * 1 Gigabit = 181.18 Mbps
  • FT: 10 shares;                          (10/55) * 1 Gigabit = 181.18 Mbps
  • iSCSI: 10 shares;                      (10/55) * 1 Gigabit = 181.18 Mbps
  • Virtual Machine: 20 shares; (20/55) * 1 Gigabit = 363.64 Mbps
  • Total shares: 5 + 10 + 10 + 10 + 20 = 55

To calculate the bandwidth numbers during contention, you should first calculate the percentage of bandwidth for a traffic type by dividing its share value by the total available share number (55). In the second step the total bandwidth of the dvuplink (1 Gigabit) is multiplied with the percentage of bandwidth number calculated in the first step. For example, 5 shares allocated to management traffic translate to 90.91 Mbps of bandwidth to management process on a fully utilized 1 Gigabit network adapter. In this example, custom share configuration is discussed but customer can make use of predefined High (100), Normal (50), and Low (25) shares while assigning them to different traffic types.

The vSphere platform takes these configured share values and applies them per uplink. The schedulers running at each uplink are responsible in making sure that the bandwidth resources are allocated according to the shares. In case of eight 1Gigabit Ethernet network adapter deployment, there are eight schedulers running. Depending on the number of traffic types scheduled on a particular uplink, the scheduler will divide the bandwidth among the traffic types based on the share numbers. For example, if only FT (10 shares) and Management (5 shares) traffic are flowing through dvuplink 5, then based on the shares value, FT traffic will get double the bandwidth of management traffic. Also, when there is no management traffic flowing, all bandwidth can be utilized by FT process. This flexibility in allocating I/O resources is the key benefit of NIOC feature.

The NIOC limits parameter of Table 3 is not configured in this design. The Limit value specifies an absolute maximum limit on egress traffic for a traffic type. Limits are specified in Mbps. This configuration provides a hard limit on any traffic even if I/O resources are available to use. It is not recommended to use limit configuration unless you really want to control the traffic even though additional resources are available.

There is no change in physical switch configuration in this design approach even with the choice of the new LBT algorithm. The LBT teaming algorithm doesn’t require any special configuration on physical switches. Please refer to the physical switch settings described in design option 1.

Table 3 Dynamic design configuration with NIOC and LBT

Traffic Type

Port Group

Teaming Option

Active Uplink

Standby Uplink

NIOC Shares

NIOC Limits

Management

PG-A

LBT

1,2,3,4,

5,6,7,8

None

5

-

vMotion

PG-B

LBT

1,2,3,4,

5,6,7,8

None

10

-

FT

PG-C

LBT

1,2,3,4,

5,6,7,8

None

10

-

iSCSI

PG-D

LBT

1,2,3,4,

5,6,7,8

None

10

-

Virtual Machine

PG-E

LBT

1,2,3,4,

5,6,7,8

None

20

-

 

One thing to note about this design is that it doesn’t provide higher than 1 Gigabit bandwidth to the vMotion and iSCSI traffic types, as is the case in one of the static design using multi-NIC vMotion and iSCSI multipathing. The Load based teaming algorithm cannot split the infrastructure traffic across multiple dvuplink ports and utilize all the links. So, even if the vMotion dvportgroup PG-B has all the eight 1 Gigabit Ethernet network adapters as active uplinks, vMotion traffic will be carried over only one of the eight uplink. The main advantage of this design is in the scenarios where the vMotion process is not using the uplink bandwidth, and other traffic types are in need of the additional resources. In these situations NIOC makes sure that the unused bandwidth is allocated to the other traffic types that need it.

This dynamic design option is the recommended approach because it takes advantage of the advanced VDS features and utilizes I/O resource efficiently. This option also provides Active-Active resiliency where no uplinks are in standby mode. In this design approach, customers allow the vSphere platform to make the optimal decisions on scheduling traffic across multiple uplinks.

Some customers who have restrictions in the physical infrastructure in terms of bandwidth capacity across different paths and limited availability of the layer 2 domain might not be able to take advantage of this dynamic design option. While deploying this dynamic option, it is important to consider all different traffic paths that a traffic type can take and make sure that the physical switch infrastructure can support the specific characteristics required for each traffic type. VMware recommends that vSphere and Network administrators should work together to understand the impact of this dynamic traffic scheduling over physical network infrastructure before deploying this approach.

Every customer environment is different, and the requirements for the traffic types are also different. Depending on the need of the environment, customer can modify these design options to fit their specific requirements. For example, customers can choose to use combination of static and dynamic design option when they need higher bandwidth for iSCSI and vMotion activities. In this hybrid design four uplinks can be statically allocated to iSCSI and vMotion traffic types while remaining four uplinks can be used dynamically for the remaining traffic types. The Table 4 below shows the traffic types and associated port group configuration for the hybrid design.

Table 4 Hybrid design configuration

Traffic Type

Port Group

Teaming Option

Active Uplink

Standby Uplink

NIOC Shares

NIOC Limits

Management

PG-A

LBT

1,2,3,4

None

5

-

vMotion

PG-B1

None

5

6

-

-

vMotion

PG-B2

None

6

5

-

-

FT

PG-C

LBT

1,2,3,4

None

10

-

iSCSI

PG-D1

None

7

None

-

-

iSCSI

PG-D2

None

8

None

-

 

Virtual Machine

PG-E

LBT

1,2,3,4

None

20

-

 

In the next blog entry I will discuss the Rack server deployment with two 10 Gigabit network adapters.

 

VDS Best Practices – Virtual and Physical Switch Parameters (Part 2 of 6)

Important virtual and physical switch parameters

Before diving into the different design options around the example deployment, let’s take a look at the VDS (virtual) and physical network switch parameters that should be considered in all these design options. These are some key parameters that vSphere and network administrators have to take into account while designing VMware virtual networking. As the configuration of virtual networking goes hand in hand with physical network configuration, this section will cover both the VDS and Physical switch parameters.

VDS parameters

VDS simplifies the challenges of the configuration process by providing one single pane of glass to perform virtual network management tasks. As opposed to configuring vSphere standard switches (VSS) on individual hosts, administrators can configure and manage one single vSphere distributed switch. All centrally configured network policies on VDS get pushed down to the host automatically when the host gets added to the distributed switch. In this section an overview of key VDS parameters is provided.

Host Uplink Connections (vmnics) and dvuplink parameter

VDS has a new abstraction for the physical Ethernet network adapters (vmnics) on each host. This new abstraction is called dvuplinks that gets defined during the creation of the VDS. All the properties including NIC teaming, load balancing, and failover policies on VDS and dvportgroups are applied to dvuplinks and not to vmnics on individual hosts. When a host gets added to the VDS, each vmnic on the host is mapped to a dvuplink. This provides the advantage of consistently applying the teaming and failover configurations to all the hosts irrespective of how the dvuplink and vmnic assignments are made.

The Figure 1 below shows two ESXi hosts with four Ethernet network adapters each. When these hosts are added to the VDS with four dvuplinks configured on a dvuplink portgroup, administrators have to assign the network adapters (vmnics) of the hosts to dvuplinks. To illustrate the mapping of the dvuplinks to vmnics Figure 1 shows one type of mapping where ESXi hosts vmnic0 is mapped to dvuplink1 and vmnic1 to dvuplink2 and so on. Customers can choose different mapping if required where vmnic0 can be mapped to different dvuplink instead of dvuplink1. VMware recommends having consistent mapping across different hosts because it reduces complexity in the environment. 

  Dvuplink_to_NIC_mapping

Figure 1 dvulpink to vmnic mapping

As a best practice, customers should also try to deploy hosts with same number of physical Ethernet network adapters and with similar port speeds. Also, as the number of dvuplink configuration on VDS depends on the maximum number of physical Ethernet network adapters on a host, administrators should take that into account during dvuplink portgroup configuration. Customers always have an option to modify this dvuplink configuration based on the new hardware capabilities.

Traffic Types and dvportgroup parameters

Similar to portgroups on standard switches, dvportgroups define how the connection is made through the VDS to the network. The VLAN ID, traffic shaping, port security, teaming and load balancing parameters are configured on these dvportgroups. The virtual ports (dvports) connected to a dvportgroup share the same properties configured on a dvportgroup. When customers want a group of virtual machines to share the security and teaming policies, they have to make sure the virtual machines are part of one dvportgroup. Customers can choose to define different dvportgroups based on the different traffic types they have in their environment or based on the different tenants or applications they support in the environment.

 In this example deployment, the dvportgroup classification is based on the traffic types running in the virtual infrastructure. Once administrators understand the different traffic types in the virtual infrastructure and identify specific security, reliability and performance requirements for individual traffic types, the next step is to create unique dvportgroups associated with each traffic type. As mentioned earlier, the dvportgroup configuration defined at VDS level is automatically pushed down to every host that is added to the VDS. For example, in Figure 1, you can see that the two dvportgroup PG-A (Yellow) and PG-B (Green) defined at the distributed switch level are available on each of the ESXi host that is part of that VDS.

 

dvportgroup specific configuration

Once customers decide on the number of unique dvportgroups they want to create in their environment, they can start configuring those dvportgroups. The configuration options/parameters are similar to those available with port groups on vSphere standard switches. There are some additional options available on VDS dvportgroup that are related to teaming setup. These new options are not available on vSphere standard switches. Customers can configure the following key parameters for each dvportgroup.

  • Number of virtual ports (dvports)
  • Port binding (static, dynamic, ephemeral)
  • VLAN Trunking/Private VLANs
  • Teaming and Load Balancing along with Active and Standby Links
  • Bi-directional traffic shaping parameters
  • Port Security

As part of the teaming algorithm support, VDS provides a unique approach to load balance traffic across the teamed network adapters. This approach is called Load Based Teaming (LBT), and it distributes the traffic across the network adapters based on the percentage utilization of traffic on those adapters. LBT algorithm works on both ingress and egress direction of the network adapter traffic as opposed to the hashing algorithms that work only in egress direction (traffic flowing out of the network adapter). Also, LBT prevents the worst-case scenario that could happen with hashing algorithms where all traffic hashes to one network adapter of the team and other network adapters are not used to carry any traffic. To improve the utilization of all the links/network adapters, VMware recommends the use of this advanced feature (LBT) of VDS. The LBT approach is recommended over the Etherchannel on physical switches and route based IP hash configuration on the virtual switch.

Port security policies at port group level allow customer protection from certain behaviors that could compromise security. For example, a hacker could impersonate a virtual machine and gain unauthorized access by spoofing the virtual machines MAC address. VMware recommends to set the MAC address Changes and Forged Transmits to “Reject” to help protect against attacks launched by a rogue guest operating system. Set the Promiscuous Mode to “Reject” unless customers want to monitor the traffic for network troubleshooting or Intrusion detection purpose.

NIOC

Network I/O control (NIOC) is the traffic management capability available on VDS. The NIOC concept revolves around resource pools that are similar in many ways to the ones existing for CPU and Memory. vSphere and network administrators now can allocate I/O shares to different traffic types similar to allocating CPU and Memory resources to a VM. The share parameter specifies the relative importance of a traffic type over other traffics, and provides a guaranteed minimum when the different traffic competes for a particular network adapter. The shares are specified in abstract units numbered 1 to 100. Customers can provision shares to different traffic types based on the amount of resources each traffic type requires.

This capability of provisioning I/O resources is very useful in situations where there are multiple traffic types competing for resources. For example, in a deployment where vMotion and VM traffic types are flowing through one network adapter, it is possible that vMotion activity can impact the virtual machine traffic performance. In this situation, shares configured in NIOC provide the required isolation to the vMotion and VM traffic type and prevents one flow (traffic type) dominating other flow. NIOC configuration provides one more parameter that customers can utilize if they want to put any limits on a particular traffic type. This parameter is called the Limit. The Limit configuration specifies the absolute maximum bandwidth for a traffic type on a host. The configuration of limit parameter is specified in Mbps. NIOC limits and shares parameters only work on the outbound traffic i.e traffic that is flowing out of the ESXi host.

VMware recommends customers to utilize this traffic management feature whenever they have multiple traffic types flowing through one network adapter. This situation of multiple traffic type flowing through a network adapter is more prominent with 10 Gigabit Ethernet network deployments but can happen in 1 Gigabit Ethernet network deployments as well. The common use case for using NIOC in 1 Gigabit network adapter deployment is when the traffic from different workloads or different customer VMs is carried over the same network adapter. As multiple workload traffic flows through a network adapter, it becomes important to provide I/O resources based on the needs of the workload. With the release of vSphere 5, customers now can make use of the new user defined network resource pools capability and allocate I/O resource to the different workloads or different customer VMs depending on their needs. This user defined network resource pool feature provides the granular control in allocating I/O resources and meeting the SLA requirements for the virtualized tier 1 workloads.

 Bi-directional traffic shaping

Apart from NIOC, there is one more traffic-shaping feature that is available in the vSphere platform. This traffic-shaping feature can be configured on a dvportgroup or dvport level. Customers can shape both inbound and outbound traffic using three parameters: average bandwidth, peak bandwidth, and burst size. Customers who want more granular traffic shaping controls to manage their traffic types can take advantage of this capability of VDS along with NIOC feature. It is recommended to involve network administrators in your organization while configuring these granular traffic parameters. These controls only makes sense when there are oversubscription scenarios that are causing network performance issues. These oversubscription scenarios could be caused because of the oversubscribed physical switch infrastructure or virtual infrastructure. So it is very important to understand the physical and virtual network environment before making any bi-directional traffic-shaping configurations.

 

Physical Network switch parameters

The configuration of VDS and physical network switch should go hand in hand to provide resilient, secure and scalable connectivity to the virtual infrastructure. The following are some key switch configuration parameters customer should pay attention to.

VLAN

If VLANs are used to provide logical isolation between different traffic types it is important to make sure that those VLANs are carried over to the Physical switch infrastructure. To do so, enable VST (Virtual switch tagging) on the virtual switch, and trunk all VLANs to the physical switch ports.

Spanning Tree Protocol (STP)

Spanning Tree protocol is not supported on virtual switches and thus no configuration is required on VDS. But it is important to enable this protocol on the physical switches. STP makes sure that there are no loops in the network. As a best practice, customer should configure the following.

  • Use “portfast” on ESXi host facing physical switch ports. With this setting, network convergence on these switch ports will happen fast after the failure because the port will enter the Spanning tree forwarding state immediately, bypassing the listening and learning states
  • Use “BPDU guard” to enforce STP boundary. This configuration protects from any invalid device connection on the ESXi host facing access switch ports. As mentioned earlier, VDS doesn’t support Spanning Tree protocol and thus doesn’t send any Bridge Protocol Data Unit (BPDU) frames to the switch port.  However, if any BPDU is seen on these ESXi host facing access switch ports the BPDU guard feature puts that particular switch port in error-disabled state. The switch port is completely shut down and prevents affecting the Spanning Tree Topology.

The recommendation of enabling “portfast” and “BPDU guard” on the switch ports is valid only when customers connect non-switching/bridging devices to these ports. The switching/bridging devices can be hardware based physical boxes or servers running software based switching/bridging function. Customers should make sure that there is no switching/bridging function enabled on the ESXi hosts that are connected to the physical switch ports.

In the scenario where the ESXi host has a guest VM that is configured to perform bridging function, the VM will generate BPDU frames and send out to the VDS. The VDS then forwards the BPDU frames through the network adapter to the physical switch port. When the switch port configured with “BPDU guard” receives the BPDU frame, the switch disables the port and the VM looses connectivity. To avoid this network failure scenario while running software-bridging function on an ESXI host, customers should disable the “portfast” and “BPDU guard” configuration on the port and run the spanning tree protocol.

In case customers are concerned about the security hacks that can generate BPDU frames, they should make use of the VMware vShield App security product that can block the frames and protect the virtual infrastructures from such layer 2 attacks. Please refer to vShield product documentation for more details on how to secure your vSphere virtual infrastructure. http://www.vmware.com/products/vshield/overview.html

Link Aggregation setup

Link Aggregation is used to increase throughput and improve resiliency by combining multiple network connections. There are various proprietary solutions in the market along with vendor-independent IEEE 802.3ad (LACP) standard based implementation. All solutions establish a logical channel between the two end points using multiple physical links. In the vSphere virtual infrastructure the two ends of the logical channel are virtual switch (VDS) and physical switch. These two switches have to be configured with link aggregation parameters before the logical channel is established. Currently, VDS supports static link aggregation configuration and does not provide support for dynamic LACP. When customers want to enable link aggregation on a physical switch, they should configure static link aggregation on the physical switch and select IP hash as NIC teaming on the VDS.

When establishing the logical channel with multiple physical links, customers should make sure that the Ethernet network adapter connections from the host are terminated on a single physical switch. However, if customers have deployed clustered physical switch technology then the Ethernet network adapter connections can be terminated on two different physical switches. The clustered physical switch technology is referred by different names by networking vendors. For example, Cisco calls their switch clustering solution as VSS (Virtual Switching System) while Brocade calls it as VCS (Virtual Cluster Switching). Please refer to the networking vendor guidelines and configuration details while deploying switch-clustering technology.

Link State Tracking

Link state tracking is a feature available on Cisco switches to manage the link state of downstream ports (ports connected to Servers) based on the status of upstream ports (ports connected to Aggregation/Core switches). When there is any failure on the upstream links connected to aggregation or core switches, the associated downstream link status goes down. The server connected on the downstream link is then able to detect the failure and re-route the traffic on other working links. This feature thus provides the protection from network failures due to the down upstream ports in non-mesh topologies. Unfortunately, this feature is not available on all vendors’ switches, and even if it is available, it might not be referred to as link state tracking. Customers should talk to the switch vendors to find out if similar feature is supported on their switches.

The Figure 2 below shows the resilient mesh topology on the left and a simple loop free topology on the right. VMware highly recommends deploying the mesh topology shown on the left that provides highly reliable redundant design, and it doesn’t need link state tracking feature. Customers who don’t have the high-end networking expertise and are also limited with number of switch ports might prefer the deployment shown on the right. In this deployment customers don’t have to run the Spanning Tree Protocol because there are no loops in the network design. The downside of this simple design is when there is a failure on the link between the access and aggregation switch. In that failure scenario, the server will continue to send traffic on the same network adapter even when the access layer switch is dropping the traffic at the upstream interface.  To avoid this black holing of server traffic, customers can enable link state tracking on the virtual and physical switches and indicate any failure between access and aggregation switch layer to the server through link state information.

Mesh_no_mesh_topolgy
Figure 2 Resilient loop and no-loop topologies

VDS has default network failover detection configuration set as “Link status only”. Customers should keep this configuration if they are enabling the link state-tracking feature on physical switches. If link state tracking capability is not available on physical switches, and there are no redundant paths available in the design, then customers can make use of Beacon Probing feature available on VDS. Beacon probing function is a software solution available on virtual switches for detecting link failures upstream from the access layer physical switch to the aggregation/core switches. Beacon probing is most useful with three or more uplinks in a team.

Maximum Transfer Unit (MTU)

Make sure that the Maximum Transfer Unit (MTU) configuration matches across the virtual and physical network switch infrastructure.

 

After covering the important virtual and physical switch parameters and some recommended guidelines for each, we will take a look at the rack server deployments with multiple 1 Gigabit network adapters as well as two 10 Gigabit network adapters in the next blog entry.

 

vSphere 5 New Networking Features – Enhanced NIOC

Network I/O Control Enhancements

Consolidated I/O or I/O virtualization delivers similar benefits as provided by x86 virtualization in terms of better utilization and consolidation of resources. However, as multiple traffic types flow through a single physical network interface, it becomes important to manage the traffic effectively such that critical application flows don’t suffer because of a burst of low-priority traffic. Network traffic management provides the required control and guarantee to different traffic types in the consolidated I/O environment. In the VMware vSphere 5 platform, NIOC supports traffic management capabilities for the following traffic types or also called as network resource pools:

• Virtual machine traffic

• Management traffic

• iSCSI traffic

• NFS traffic

• Fault-tolerant traffic

• VMware vMotion traffic

• User-defined traffic

• vSphere replication traffic

Similar to CPU and memory resource allocation in the vSphere platform, a network administrator through NIOC can allocate I/O shares and limits to different traffic types, based on their requirements. In this new release of vSphere, NIOC capabilities are enhanced such that administrators can now create user-defined traffic types and allocate shares and limits to them. Also, administrators can provide I/O resources to the vSphere replication process by assigning shares to vSphere replication traffic types. Let’s look at some details on User defined and vSphere replication traffic types.

 

User-Defined Network Resource Pools

User-defined network resource pools in vSphere 5 provide an ability to add new traffic types beyond the standard system traffic types that are used for I/O scheduling.

Figure below shows an example of a user-defined resource pool with shares, limits and IEEE 802.1p tag parameters described in a table. In this example, Tenant 1 and Tenant 2 are two user-defined resource pools with virtual machines connected to their respective independent port groups. Tenant 1, with three virtual machines, has five I/O shares. Tenant 2, with one virtual machine, has 15 I/O shares. This indicates that during contention scenarios, Tenant 2 virtual machines will have a higher guaranteed share than Tenant 1 virtual machines.

NIOC_user_defined

Usage 

When customers are deploying critical applications on virtual infrastructure, they can utilize this advanced feature to reserve I/O resources for the important, business-critical application traffic and provide SLA guarantees.

Service providers who are deploying public clouds and serving multiple tenants can now define and provision I/O resources per tenant, based on each tenant’s need.

Configuration

The new resource pools can be defined at the Distributed Switch level by selecting the resource allocation tab and clicking on new network resource pools. After a new network resource pool is defined with shares and limits parameters, that resource pool can be associated with a port group. This association of a network resource pool with a port group enables customers to allocate I/O resources to a group of virtual machines or workloads. The figure below shows the new Tenant 1 and Tenant 2 resource pools created under user-defined network resource pools.

Nioc_resource_allocation

vSphere Replication Traffic

 vSphere replication is a new system traffic type that carries replication traffic from one host to another. NIOC now supports this new traffic type along with other system and user-defined traffic types.

 Usage

Customers implementing a disaster recovery (DR) solution with VMware vCenter Site Recovery Manager (Site Recovery Manager) and vSphere replication can use this vSphere replication traffic type to provide required network resources to the replication process.

Configuration

A vSphere replication traffic type can be configured on a Distributed Switch under the resource allocation tab. This traffic type is now part of the system network resource pool. Customers can allocate shares and limits parameters to this traffic type.

 

IEEE 802.1p Tagging

IEEE 802.1p is a standard for enabling QoS at MAC level. The IEEE 802.1p tag provides a 3-bit field for prioritization, which allows packets to be grouped into seven different traffic classes. The IEEE doesn’t mandate or standardize the use of recommended traffic classes. However, higher-number tags typically indicate critical traffic that has higher priority. The traffic is simply classified at the source and sent to the destination. The layer-2 switch infrastructure between the source and destination handles the traffic classes according to the assigned priority. In the vSphere 5.0 release, network administrators now can tag the packets going out of the host.

 

Usage

Customers who are deploying business-critical applications in a virtualized environment now have the capability to guarantee I/O resources to these workloads on the host. However, it is not sufficient to provide I/O resources just on the host. Customers must think about how to provide end-to-end QoS to the business-critical application traffic. The capability of a Distributed Switch to provide an IEEE 802.1p tag helps such customers meet those requirements for end-to-end QoS or service-level agreements.

 Configuration

IEEE 802.1p tagging can be enabled per traffic type. Customers can select the Distributed Switch and then the resource allocation tab to see the different traffic types, including system and user-defined traffic types. After selecting a traffic type, the user can edit the QoS priority tag field by choosing any number from 1 to 7. Figure below is the screenshot of QoS priority tag configuration for the MyVMTraffic traffic type.

Nioc_802.1p 

With this post, I have completed the coverage of new networking features in vSphere 5. Also, today VMware has officially announced the genearal availability of vSphere 5.

I will be attending VMworld 2011 during the week of Aug 29th. At VMworld, I have a session on VDS best practices and couple of group discussions. Looking forward to meeting with various partners and customers.

After VMworld, I will focus my attention on writing about the different deployment options with vSphere Distributed Switch (VDS).