Product Announcements

Using both Storage I/O Control & Network I/O Control for NFS

Many of these blog articles arise from conversations I have with folks both internally at VMware & externally in the community. This post is another such example. What I really like about this job is that it gets me thinking about a lot of stuff that I normally take for granted. The question this time was around using both Storage I/O Control (SIOC) & Network I/O Control (NIOC) for NFS traffic & Virtual Machines residing on NFS datastores, and could they possibly step on each others toes, so to speak.

The answer is no, the technologies are complementary. Let me try to explain how.

First off, let's have a brief overview of what the technologies do.

Intro to Storage I/O Control (SIOC)

SIOC was covered in a previous blog post. Details can be found here – http://blogs.vmware.com/vsphere/2011/09/storage-io-control-enhancements.html. In a nutshell, if SIOC detects that a pre-defined latency threshold for a particular datastore has been exceeded, it will throttle the amount of I/O a VM can queue to that datastore based on a 'shares' mechanism. When the contention is alleviated, SIOC will stop and VMs can then begin to use the datastore without any throttling. This avoids the 'noisy neighbor' problems when one VM can hog all the bandwidth to a shared datastore. The point to note here is that SIOC is working on a per VM basis, and deals with datastore objects.

SIOC was first introduced in vSphere 4.1, but only for block storage devices (FC, iSCSI, FCoE) only. In vSphere 5.0, we introduced SIOC support for NFS datastores.

Intro to Network I/O Control (NIOC)

There is a nice overview of NIOC on the networking blog here – http://blogs.vmware.com/networking/2010/07/got-network-io-control.html. Again, in a nutshell, NetIOC allows you to  define a guaranteed bandwidth for different vSphere network traffic types.

NIOC uses a software approach to partitioning physical network bandwidth among the different types of network traffic flows. For example, you can guarantee a minimum NFS bandwidth/latency when a vMotion operation is initiated on the same network & prevent the vMotion operation from having an impact on the NFS traffic flow. The point to note here is that NIOC is working on a network traffic stream, e.g. NFS, and deals with NIC ports.

SIOC & NIOC Together

Lets take a scenario where there are multiple VMs spread across multiple ESXi hosts, all sharing the same NFS datastore.

i) SIOC Use Case

For quite a while, we have been able to give bandwidth fairness to VMs running on the same host via the SFQ, the start-time fair queueing scheduler. This scheduler ensures share-based allocation of I/O resources between VMs on a per host basis. It is when we have VMs accessing the same datastore from different hosts that we've had to implement a distributed I/O scheduler. This is called PARDA, the Proportional Allocation of Resources for Distributed Storage Access. PARDA carves out the array queue amongst all the Virtual Machines which are sending I/O to the datastore on the array & adjusts the per host per datastore queue size depending on the sum of the per Virtual Machine shares on the host.

If SIOC is enabled on the datastore, and the latency threshold on the datastore is surpassed because of the amount of disk I/O that the VMs are generating on the datastore, the I/O bandwidth allocated to the VMs sharing the datastores will be adjusted according to the share values assigned to the VMs.

ii) NIOC Use Case

But what if something impacts the NFS traffic flow? In this case, VM performance may be impacted not because of an over-committed datastore, but due to there not being enough network bandwidth for the ESXi host to communicate with the NFS server. For instance, as mentioned in the beginning of the post, what if a vMotion operation was initiated (an operation which could consume up to 8Gbps of the network bandwidth), and impacted the other traffic on the same pipe, such as NFS? Yes, I know a best practice from VMware is to dedicate a NIC for vMotion traffic to avoid this exact situation, but this isn't always practical on 10Gb networks. In the case where vMotion, NFS and other traffic types are sharing the same uplink, NIOC allows us to guarantee a minimum bandwidth on a per traffic type. The really cool thing is that when there is no congestion, network traffic can use *all* the available bandwidth of the uplink. And just for clarification, the uplink is actually a dvuplink since NIOC can only be enabled on distributed switches. The feature is not available on stand-alone vSwitches.

Another important point to note which sometimes causes confusion: NFS traffic on the ESX host caused by a VM's disk I/O does not count towards that VM's portgroup bandwidth allocation should NIOC kick on. These are two distinct and separate network traffic streams/types, the former being NFS and the second being VM I/O.

Conclusion

There is no reason in my opinion not to use both SIOC and NIOC together. The technologies are complementary.

References

 

Get notification of these blogs postings and more VMware Storage information by following me on Twitter: Twitter @VMwareStorage