Home > Blogs > VMware vSphere Blog > Tag Archives: NFS

Tag Archives: NFS

vSphere 5.0 U1 now supports routed NFS storage access

I have to admit, I missed this in the release notes when we initially brought out 5.0U1. But it is rather neat that we now support routing NFS traffic.

There are some caveats however. From the release notes:

vSphere 5.0 Update 1 supports L3 routed NFS storage access when you ensure that your environment meets the following conditions:

  • Use Cisco's Hot Standby Router Protocol (HSRP) in IP Router. If you are using non-Cisco router, be sure to use Virtual Router Redundancy Protocol (VRRP) instead.
  • Use Quality of Service (QoS) to prioritize NFS L3 traffic on networks with limited bandwidths, or on networks that experience congestion. See your router company documentation for details.
  • Follow Routed NFS L3 best practices recommended by storage vendor. Contact your storage vendor for details.
  • Disable Network I/O Resource Management (NetIORM)
  • If you are planning to use systems with top-of-rack switches or switch-dependent I/O device partitioning, contact your system vendor for compatibility and support.

In an L3 environment the following additional restrictions are applicable:

  • The environment does not support VMware Site Recovery Manager.
  • The environment supports only NFS protocol. Do not use other storage protocols such as FCoE over the same physical network.
  • The NFS traffic in this environment does not support IPv6.
  • The NFS traffic in this environment can be routed only over a LAN. Other environments such as WAN are not supported.
  • The environment does not support Distributed Virtual Switch (DVS).

Get notification of these blogs postings and more VMware Storage information by following me on Twitter: Twitter @VMwareStorage

NFS Block Sizes, Transfer Sizes & Locking

Cormac_Hogan
Posted by Cormac Hogan
Technical Marketing Architect (Storage)

I've had a few questions recently around the I/O characteristics of VMware's NFS implementation. I'm going to use this post to answer the common ones.

 

NFS Block Sizes

 The first of these questions is usually around the block size used by NFS. The block size on NFS datastores is "only" based on the block size of the native filesystem on the NFS server or NAS array, so the size depends solely on the underlying storage architecture of the server or the array.

The block size has no dependancy on the Guest Operating System block size (which is a common misconception) because the Guest OS's virtual disk (VMDK) is only a flat file that is created on the server/array. This file is subject to the block sizes enforced on the NFS server's or NAS array's filesystem.

One more interesting piece of detail is that when there is a fsstat done on the NFS mount on the ESXi client, the ESXi NFS client always returns the default file block size as 4096. Here is an example of this using the vmkfstools command to look at the file block size:

Vmkfstools - 4k bs
 

Maximum Transfer Sizes

The NFS datastore's block sizes is different from maximum read and write transfer sizes. The maximum read and write transfer sizes are the chunks in which the client communicates with the server. A typical NFS server could advertize 64KB as the maximum transfer size for reads and writes. In this case, a 1MB read would be broken down into a 16 x 64KB sized reads. However, the point is that this has got nothing to do with the block sizes of the NFS datastore on the NFS server/NAS array.

 

NFS (Version 3) Locking

Another common question I get is around NFS locking. In NFS v3, which is the version of NFS still used by vSphere, the client is responsible for all locking activities such as liveliness and enforcement. The client must 'heartbeat' the lock on a periodic basis to maintain the lock. The client must also verify the lock status before issuing each I/O to the file that is protected by that lock. The client which holds the lock must periodically update the timestamp stored in the lock file to ensure lock liveliness. If another client wishes to lock the file, it monitors the lock liveliness by polling the timestamp. If the timestamp is not updated during a specific window of time (discussed later), the client which holds the lock is presumed dead and the competing client may break the lock.

To ensure consistency, I/O is only issued to the file when the client is the lock holder and the lock lease has not expired yet. By default, there are 3 heartbeat attempts at 10 seconds intervals and each heartbeat has a 5 seconds timeout. In the worst case, when the last heartbeat attempt times out, it will take 3 * 10 + 5 = 35 seconds before the lock is marked expired on the lock holder client. Before the lock is marked expired, I/O will continue to be issued, even after failed heartbeat attempts.

Lock preemption on a competing client starts from the detection of lock conflict. It then takes 3 polling attempts with 10 seconds intervals for the competing host to declare that the lock has expired and break it. It then takes another 10 seconds to establish its own lock. Lock preemption will be completed in 3 * 10 + 10 = 40 seconds before I/O will start to flow on the competing host.

 

Get notification of these blogs postings and more VMware Storage information by following me on Twitter: Twitter @VMwareStorage

A closer look at the View Composer API for Array Integration [incl. Video]

A week or so ago I published an article about new View 5.1 storage features. I followed this up with a short video post explaining how you would go about using View Storage Accelerator. In this article, I want to demonstrate the other very cool feature in View 5.1, VCAI (View Composer API for Array Integration) to you. Although this feature is still in Tech Preview for View 5.1, it is a very cool enhancements which could have very many benefits when it is eventually fully supported as a feature.

Another way of describing this feature is Native NFS Snapshots. Essentially, what the feature allows you to do is to offload the creation of the linked clones which back your View desktops to the storage array, and let the storage array handle this task. In order to do this, the NAS storage array on which the snapshots are being deployed must have the NAS Native Snapshot VAAI (vSphere API for Array Integration) feature, which was first introduced in vSphere 5.0. A special VIB/plugin (provided by the 3rd party storage array vendor) must also be installed on the ESXi host to allow us to use this offload mechanism.

The main advantage of VCAI is an improvement in performance and a reduction in the time taken to provision desktops based on linked clone pools. This task can now be offloaded to the array, which can then provision these linked clones natively rather than have the ESXi host do it. 

What follows is a short video (approx. 3 and a half minutes) of setting up View 5.1 VCAI feature, showing an installed VCAI VIB from NetApp on the ESXi host, and then how to use native NFS snapshots when creating desktop pools based on linked clones. Again, my thanks to Graham Daly of VMware KBTV fame for his considerable help with this.

Further detail about the View Composer for Array Integration (VCAI)  can be found on the EUC blog here.

Get notification of these blogs postings and more VMware Storage information by following me on Twitter: Twitter @VMwareStorage

New Storage Features in VMware View 5.1

VMware's flagship VDI product, VMware View, has a new release coming out. I don't normally blog about EUC (End User Computing) or VDI as it is not my area of expertise. However VMware View 5.1 has a number of really neat new storage related features which are making use of enhancements which were first introduced in vSphere 5.0.

 

View Storage Accelerator

This first feature was originally called CBRC (Content Based Read Cache). This was initially introduced in vSphere 5.0. Although it is a vSphere feature, it is designed specifically for VMware View. With the release of View 5.1, the View Storage Accelerator feature can now be used to dramatically  improve the read throughput for View desktops. This will be particularly useful during a boot storm or anti-virus storm, where many virtual machines could be reading the same data from the same base disk at the same time. The implementation of the accelerator is done by taking an area of host memory for cache, and then creating 'digest' files for each virtual machine disk. This feature will be most useful for shared disks that are read frequently, such as View Composer OS disks. It will be available 'out of the box' with View 5.1; no additional components will need to be installed. This feature will significantly improve performance. More here.

 

32 ESXi nodes sharing NFS datastores

This storage feature is also quite significant. While VMware has been able to create 32 node clusters for some time, VMware View would only allow a base disk on an NFS datastore to be shared between 8 ESXi hosts for the purposes of linked clone deployments. View 5.1 lifts this restriction, and now 32 ESXi hosts can host linked clones deployed from the same base disk on a shared NFS datastore. This feature will significantly improve scalability.

 

View Composer API for Array Integration (VCAI) aka Native NFS Snapshots

Although this feature is a Technology Preview in View 5.1, it is another cool storage feature of the release. View desktops deployed on VMware's linked clone technology consumes CPU on the ESXi hosts, and network bandwidth when they are deployed on NFS datastores. With this new  Native NFS Snapshot feature via VAAI (vSphere Storage APIs for Array Integration), customers can offload the cloning operation to the storage array, minimizing CPU usage and network bandwidth consumption. Once again this enhanced VAAI functionality was introduced in vSphere 5.0 specifically for VMware View. This feature requires a VAAI NAS plugin from the storage array vendor. Once installed and configured, customers will be able to use a storage array vendor's own native snapshot feature for deploying View desktops. Selecting this new desktop deployment method can be done via standard work-flows in View Composer. More here.

 

I'm sure you will agree that these are very exciting features. By providing a read caching mechanism, offloading snapshots/clones to the storage array and supporting up to 32 hosts sharing a single base disk, VMware View 5.1 now has greater performance and scalability than ever before. Of course, there are many other enhancements, including a vCenter Operations Manager (vCOps) extension specifically for View, so please check out the View 5.1 news release on VMware.com. For those of you using VMware View, this is definitely a release worth checking out.

 

Over the next couple of weeks, I hope to look at these features in even greater details.

Get notification of these blogs postings and more VMware Storage information by following me on Twitter: Twitter @VMwareStorage

Scale Out VSA – Mounting VSA NFS datastores to non-VSA ESXi hosts

I’ve written a great many post around the vSphere Storage Appliance (VSA), a selection of which can be found at this link. However I don’t think I’ve ever explained how the NFS datastores (which are presented from the VSA appliances to the ESXi hosts participating in the cluster) can also be presented to other ESXi hosts that are not participating in the VSA cluster.

This can be done in two ways:

  • Any ESXi hosts that are in the same data center object in the vCenter inventory will automatically have the NFS datastores mounted. There are no additional steps that need to be done by the administrator.
  • Any ESXi hosts that are added to the data center object in the vCenter inventory after the VSA cluster has been deployed can have the NFS datastore mounted manually, but only after the ACLs (Access Control Lists) have been updated on the VSA appliances. This can be done via the WSCLI utility found on the vCenter server after the VSA Manager has been installed. There are two steps; the first is to identify the NFS exports and the second is to allow this newly added ESXi host to mount the NFS datastores. A very detailed KB article describes how to do this.

Bottom line – ESXi hosts which are not participating in the VSA Cluster can still mount the shared NFS datastores presented from the VSA cluster. This allows the VSA to scale much higher than 3 nodes. For instance, 3 ESXi hosts could be providing the NFS datastores, but you could have an additional 5 hosts sharing those datastores. This gives you 8 ESXi hosts accessing the same shared storage, should you wish to scale out your environment.

Get notification of these blogs postings and more VMware Storage information by following me on Twitter: Twitter @VMwareStorage

NFS and the vCD Appliance

Stephens-pic-small
Tom Stephens
Senior Technical
Marketing Architect
@vCloud_Storm

I’m sure many of you have heard about the vCloud Director Appliance by now.   It’s one of the fastest ways you can get vCloud Director up and running to evaluate it in your environment. 

The key here is that it is only supported for evaluation environments.  As a result, certain design decisions were made with this in mind.  For example, even though it is based on CentOS, only packages that were critical to its use are included.  This helps to keep the size of the vCloud Director Appliance down, making it quicker to download and install.

The configuration provided should be more than sufficient for most evaluation environments.   Every once in a while though, someone has a need for something that surpasses what the vCloud Director Appliance was designed for.

Recently I was made aware of someone who had such a requirement.  What they needed was an ability to mount a NFS share from within the vCloud Director Appliance.

In a production environment, customers often use a NFS share on the vCloud Director cells to provide adequate space for the transfer service operations.   When this person tried to do this with the vCloud Director Appliance, they quickly realized that they couldn’t. 

Screen shot 2012-03-29 at 11.12.26 PM

The reason for this is because not all the NFS related packages are provided with the vCloud Director Appliance.  Normally, this is not a concern as the amount of transfer service space provided with the appliance is adequate for evaluations.  However, if you find yourself in a situation like this person was in where you need it, I’m going to show you how to get NFS working on the vCloud Director Appliance.

Before I do though, I’d like to place a special note here that this is UNSUPPORTED.  As you will have to install packages that are not included with the vCloud Director Appliance, that means that we do not perform any functional testing with these packages installed.    So if you do this, you are doing so at your own risk. 

Now the process to get NFS functionality on the vCloud Director Appliance is pretty simple.  First, make sure that your vCloud Director Appliance is connected to the Internet.  This is required, as it will be downloading the correct packages to install.  Next, use yum to install the portmap and nfs-utils packages.  You can do this by logging into the vCloud Director Appliance as the user root (where the default password is Default0) and entering the following commands:

# yum install portmap

# yum install nfs-utils

After you do this, you simply need to start the portmap service with the following command:

# service portmap start

You should be able to mount a NFS share now, using a command similar to:

# mount –t nfs4 mynfsserver:/nfs_share /nfs_mountpoint

for NFS v4, or if using NFS v3 or v2…

# mount –t nfs mynfsserver:/nfs_share /nfs_mountpoint

That’s all there is to it!

Storage Protocol Comparison – A vSphere Perspective

On many occasions I’ve been asked for an opinion on the best storage protocol to use with vSphere. And my response is normally something along the lines of ‘VMware supports many storage protocols, with no preferences really given to any one protocol over another’. To which the reply is usually ‘well, that doesn’t really help me make a decision on which protocol to choose, does it?’

And that is true – my response doesn’t really help customers to make a decision on which protocol to choose. To that end, I’ve decided to put a storage protocol comparison document on this topic. It looks at the protocol purely from a vSphere perspective; I’ve deliberately avoided performance, for two reasons:

  1.  We have another team in VMware who already does this sort of thing.
  2.  Storage protocol performance can be very different depending on who the storage array vendor is, so it doesn’t make sense to compare iSCSI & NFS from one vendor when another vendor might do a much better implementation of one of the protocols

If you are interested in performance, there are links to a few performance comparison docs included at the end of the post.

Hope you find it useful.

vSphere Storage Protocol Comparison Guide

 

iSCSI

NFS

Fiber Channel

FCoE

Description

iSCSI presents block devices to an ESXi host. Rather than accessing blocks from a local disk, the I/O operations are carried out over a network using a block access protocol. In case of iSCSI, remote blocks are accessed by encapsulating SCSI commands & data into TCP/IP packets. Support for iSCSI was introduced in ESX 3.0 back in 2006.

NFS (Network File System) presents file devices over a network to an ESXi host for mounting. The NFS server/array makes its local filesystems available to ESXi hosts. The ESXi hosts access the meta-data and files on the NFS array/server using a RPC-based protocol

VMware currently implements NFS version 3 over TCP/IP. VMware introduced support NFS in ESX 3.0 in 2006.

Fiber Channel presents block devices like iSCSI. Again the I/O operations are carried out over a network using a block access protocol. In FC, remote blocks are accessed by encapsulating SCSI commands & data into fiber channel frames.

One tends to see FC deployed in the majority of mission critical environments.

FC has been the only one of these 4 protocols supported on ESX since the beginning.

Fiber Channel over Ethernet also presents block devices, with I/O operations carried out over a network using a block access protocol. In this protocol, the SCSI commands and data are encapsulated into Ethernet frames. FCoE has many of the same characteristics of FC, except that the transport is Ethernet.

 

VMware Introduced support for HW FCoE in vSphere 4.x & SW FCoE in vSphere 5.0 back in 2011

Implementation Options

1.        NIC with iSCSI capabilities using Software iSCSI initiator & accessed using a VMkernel (vmknic) port

Or:

2.        Dependant Hardware iSCSI initiator

Or:

3.        Independent Hardware iSCSI initiator

Standard NIC accessed using a VMkernel port (vmknic)

Requires a dedicated Host Bus Adapter (HBA) (typically two for redundancy & multipathing)

1.        Hardware Converged Network Adapter (CNA)

Or:

2.        NIC with FCoE capabilities using Software FCoE initiator

Speed/Performance considerations

iSCSI can run over a 1Gb or a 10Gb TCP/IP network.

Multiple connections can be multiplexed into a single session, established between the initiator and target

VMware supports jumbo frames for iSCSI traffic, which can improve performance. Jumbo frames sends payloads larger than 1500. Support for jumbo frames with IP storage was introduced in ESX 4, but not on all initiators (KB 1007654 & KB  1009473). iSCSI can introduce overhead on a host’s CPU (encapsulating SCSI data into TCP/IP packets)

 

NFS can run over 1Gb or 10Gb over TCP/IP – NFS also supports UDP, but VMware's implementation does not & required TCP.

VMware supports jumbo frames for NFS traffic, which can improve performance in certain situations.

Support for jumbo frames with IP storage was introduced in ESX 4.

NFS can introduce overhead on a host’s CPU (encapsulating file I/O into TCP/IP packets)

Fiber Channel can run on 1Gb/2Gb/4Gb/8Gb & 16Gb, but 16Gb HBAs must be throttled to run at 8Gb in vSphere 5.0.

Buffer-to-Buffer credits & End-to-End credits throttle throughput to ensure lossless network

This protocol typically affects a host’s CPU the least as HBAs (required for FC) handles most of the processing (encapsulation of SCSI data into FC frames)

This protocol requires 10gb Ethernet.

The point to note with FCoE is that there is no IP encapsulation of the data like there is with NFS & iSCSI, which reduces some of the overhead/latency. FCoE is SCSI over Ethernet, not IP.

This protocol also requires jumbo frames since FC payloads are 2.2K in size and cannot be fragmented.

 


 


 

iSCSI

NFS

Fiber Channel

FCoE

Load Balancing

VMware’s Pluggable Storage Architecture (PSA) provides a Round-Robin Path Selection Policy which will distribute load across multiple paths to an iSCSI target. Better distribution of load with PSP_RR is achieved when multiple LUNs are accessed concurrently.

There is no load balancing per se on the current implementation of NFS as there is only a single session. Aggregate bandwidth can be configured by creating multiple paths to the NAS array, and accessing some datastores via one path, and other datastores via another.

VMware’s Pluggable Storage Architecture (PSA) provides a Round-Robin Path Selection Policy which will distribute load across multiple paths to an FC target. Better distribution of load with PSP_RR is achieved when multiple LUNs are accessed concurrently.

VMware’s Pluggable Storage Architecture (PSA) provides a Round-Robin Path Selection Policy which will distribute load across multiple paths to an FCoE target. Better distribution of load with PSP_RR is achieved when multiple LUNs are accessed concurrently.

Resilience

VMware’s PSA implements failover via its Storage Array Type Plugin (SATP) for all support iSCSI arrays. The preferred method to do this for SW iSCSI is with iSCSI Binding implemented, but it can be achieved with adding multiple targets on different subnets mapped to the iSCSI initiator.

NIC Teaming can be configured so that if one interface fails, another can take its place. However this is relying on a network failure and may not be able to handle error conditions occurring on the NFS array/server side.

VMware’s PSA implements failover via its Storage Array Type Plugin (SATP) for all support FC arrays

VMware’s PSA implements failover via its Storage Array Type Plugin (SATP) for all support FCoE arrays

Error checking

iSCSI uses TCP which resends dropped packets.

NFS uses TCP which resends dropped packets

Fiber Channel is implemented as a lossless network. This is achieved by throttling throughput at times of congestion using B2B and E2E credits

Fiber Channel over Ethernet requires a lossless network. This is achieved by the implementation of a Pause Frame mechanism at times of congestion.

Security

iSCSI implements the Challenge Handshake Authentication Protocol (CHAP) to ensure initiators and targets trust each other.

VLANs or private networks are highly recommended to isolate the iSCSI traffic from other traffic types.

 

VLANs or private networks are highly recommended to isolate the NFS traffic from other traffic types.

Some FC switches support the concepts of a VSAN to isolate parts of the storage infrastructure. VSANs are conceptually similar to VLANS.

 

Zoning between hosts and FC targets also offers a degree of isolation.

Some FCoE switches support the concepts of a VSAN to isolate parts of the storage infrastructure.

 

Zoning between hosts and FCoE targets also offers a degree of isolation.


 


 

iSCSI

NFS

Fiber Channel

FCoE

 

VAAI Primitives

Although VAAI primitives may be different from array to array, iSCSI devices can benefit from the full complement of block primitives:

·          Atomic Test/Set

·          Full Copy

·          Block Zero

·          Thin Provisioning

·          UNMAP

 

These primitives are built-in to ESXi, and require no additional software installed on the host.

Again, these vary for array to array. The VAAI primitives available on NFS devices are:

·          Full Copy (but not with Storage vMotion, only with cold migration)

·          Pre-allocate space (WRITE_ZEROs)

·          Clone offload using native snapshots

 

Note that for VAAI NAS, one requires a plug-in from the storage array vendor.

 

Although VAAI primitives may be different from array to array, FC devices can benefit from the full complement of block primitives:

·          Atomic Test/Set

·          Full Copy

·          Block Zero

·          Thin Provisioning

·          UNMAP

 

These primitives are built-in to ESXi, and require no additional software installed on the host.

Although VAAI primitives may be different from array to array, FCoE devices can benefit from the full complement of block primitives:

·          Atomic Test/Set

·          Full Copy

·          Block Zero

·          Thin Provisioning

·          UNMAP

 

These primitives are built-in to ESXi, and require no additional software installed on the host.

ESXi Boot from SAN

Yes

No

Yes

SW FCoE – No

HW FCoE (CNA) – Yes

RDM Support

Yes

No

Yes

Yes

Maximum Device Size

64TB

Refer to NAS array vendor or NAS server vendor for maximum supported datastore size.

Theoretical size is much larger than 64TB, but requires NAS vendor to support it.

64TB

64TB

Maximum number of devices

256

Default 8,

Maximum 256

256

256

Protocol direct to VM

Yes, via in-guest iSCSI initiator.

Yes, via in-guest NFS client.

No, but FC devices can be mapped directly to the VM with NPIV. This still requires RDM mapping to the VM first, and hardware must support NPIV (SW, HBA)

No

Storage vMotion Support

Yes

Yes

Yes

Yes

Storage DRS Support

Yes

Yes

Yes

Yes

Storage I/O Control Support

Yes, since vSphere 4.1

Yes, since vSphere 5.0

Yes, since vSphere 4.1

Yes, since vSphere 4.1

Virtualized MSCS Support

No. VMware does not support MSCS nodes built on VMs residing on iSCSI storage. However the use of software iSCSI initiators within guest operating systems configured with MSCS, in any configuration

supported by Microsoft, is transparent to ESXi hosts and there is no need for explicit support statements from

VMware. 

No. VMware does not support MSCS nodes built on VMs residing on NFS storage.

Yes, VMware supports MSCS nodes built on VMs residing on FC storage.

No. VMware does not support MSCS nodes built on VMs residing on FCoE storage.


 


 

iSCSI

NFS

Fiber Channel

FCoE

Ease of configuration

Medium – Setting up the iSCSI initiator requires some smarts, simply need the FDQN or IP address of the target. Some configuration for initiator maps and LUN presentation is needed on the array side. Once the target is discovered through a scan of the SAN, LUNs are available for datastores or RDMs.

Easy – Just need the IP or FQDN of the target, and the mount point. Datastore immediately appear once the host has been granted access from the NFS array/server side.

Difficult – Involves zoning at the FC switch level, and LUN masking at the array level once the zoning is complete. More complex to configure than IP Storage. Once the target is discovered through a scan of the SAN, LUNs are available for datastores or RDMs.

Difficult – Involves zoning at the FCoE switch level, and LUN masking at the array level once the zoning is complete. More complex to configure than IP Storage. Once the target is discovered through a scan of the SAN, LUNs are available for datastores or RDMs.

Advantages

No additional hardware necessary – can use already existing networking hardware components and iSCSI driver from VMware, so cheap to implement.

Well known and well understood protocol. Quite mature at this stage.

Admins with network skills should be able to implement.

Can be troubleshooted with generic network tools, such as wireshark.

 

No additional hardware necessary – can use already existing networking hardware components, so cheap to implement.

Well known and well understood protocol.

Also very mature.

Admins with network skills should be able to implement.

Can be troubleshooted with generic network tools, such as wireshark

Well known and well understood protocol.

Very mature, and trusted.

Found in majority of mission critical environments.

Enables converged networking, allowing the consolidation of network and storage traffic onto the same network via CNA – converged network adapter.

Using DCBx (Data Center Bridging protocol), FCoE has been made lossless even though it runs over Ethernet. DCBX does other things like enabling different traffic classes to run on the same network, but that is beyond the scope of this discussion.

Disadvantages

Inability to route with iSCSI Binding implemented.

Possible security issues, as there is no built in encryption, so care must be taken to isolate traffic (e.g. VLANs).

SW iSCSI can cause additional CPU overhead on the ESX host.

TCP can introduce latency for iSCSI.

Since there is only a single session per connection, configuring for maximum bandwidth across multiple paths needs some care and attention.

No PSA multipathing

Same security concerns as iSCSI since everything is transferred in clear text so care must be taken to isolate traffic (e.g. VLANs).

NFS is still version 3, which does not have the multipathing or security features of NFS v4 or NFS v4.1.

NFS can cause additional CPU overhead on the ESX host

TCP can introduce latency for NFS.

Still only runs at 8Gb which is slower than other networks (16Gb throttled to run at 8Gb in vSphere 5.0)

Needs dedicated HBA, FC switch, FC capable storage array which makes an FC implementation rather more expensive

Additional management overhead (e.g. switch zoning) is needed.

Could prove harder to troubleshoot compared to other protocols.

Rather new, and not quite as mature as other protocols at this time.

Requires a 10Gb lossless network infrastructure which can be expensive.

Cannot route between initiator and targets using native IP routing – instead it has to use protocols such as FIP (FCoE Initialization Protocol).

Could prove complex to troubleshoot/isolate issues with network and storage traffic using the same pipe.


Note 1 – I've deliberately skipped AoE (ATA-over-Ethernet) as we have not yet seen significant take-up of this protocol as this time. Should this protocol gain more exposure, I’ll revisit this article.

Note 2 – As I mentioned earlier, I’ve deliberately avoided getting into a performance comparison. This has been covered in other papers. Here are some VMware whitepapers which cover storage performance comparison:

Get notification of these blogs postings and more VMware Storage information by following me on Twitter: Twitter @VMwareStorage

Using both Storage I/O Control & Network I/O Control for NFS

Many of these blog articles arise from conversations I have with folks both internally at VMware & externally in the community. This post is another such example. What I really like about this job is that it gets me thinking about a lot of stuff that I normally take for granted. The question this time was around using both Storage I/O Control (SIOC) & Network I/O Control (NIOC) for NFS traffic & Virtual Machines residing on NFS datastores, and could they possibly step on each others toes, so to speak.

The answer is no, the technologies are complementary. Let me try to explain how.

First off, let's have a brief overview of what the technologies do.

Intro to Storage I/O Control (SIOC)

SIOC was covered in a previous blog post. Details can be found here – http://blogs.vmware.com/vsphere/2011/09/storage-io-control-enhancements.html. In a nutshell, if SIOC detects that a pre-defined latency threshold for a particular datastore has been exceeded, it will throttle the amount of I/O a VM can queue to that datastore based on a 'shares' mechanism. When the contention is alleviated, SIOC will stop and VMs can then begin to use the datastore without any throttling. This avoids the 'noisy neighbor' problems when one VM can hog all the bandwidth to a shared datastore. The point to note here is that SIOC is working on a per VM basis, and deals with datastore objects.

SIOC was first introduced in vSphere 4.1, but only for block storage devices (FC, iSCSI, FCoE) only. In vSphere 5.0, we introduced SIOC support for NFS datastores.

Intro to Network I/O Control (NIOC)

There is a nice overview of NIOC on the networking blog here – http://blogs.vmware.com/networking/2010/07/got-network-io-control.html. Again, in a nutshell, NetIOC allows you to  define a guaranteed bandwidth for different vSphere network traffic types.

NIOC uses a software approach to partitioning physical network bandwidth among the different types of network traffic flows. For example, you can guarantee a minimum NFS bandwidth/latency when a vMotion operation is initiated on the same network & prevent the vMotion operation from having an impact on the NFS traffic flow. The point to note here is that NIOC is working on a network traffic stream, e.g. NFS, and deals with NIC ports.

SIOC & NIOC Together

Lets take a scenario where there are multiple VMs spread across multiple ESXi hosts, all sharing the same NFS datastore.

i) SIOC Use Case

For quite a while, we have been able to give bandwidth fairness to VMs running on the same host via the SFQ, the start-time fair queueing scheduler. This scheduler ensures share-based allocation of I/O resources between VMs on a per host basis. It is when we have VMs accessing the same datastore from different hosts that we've had to implement a distributed I/O scheduler. This is called PARDA, the Proportional Allocation of Resources for Distributed Storage Access. PARDA carves out the array queue amongst all the Virtual Machines which are sending I/O to the datastore on the array & adjusts the per host per datastore queue size depending on the sum of the per Virtual Machine shares on the host.

If SIOC is enabled on the datastore, and the latency threshold on the datastore is surpassed because of the amount of disk I/O that the VMs are generating on the datastore, the I/O bandwidth allocated to the VMs sharing the datastores will be adjusted according to the share values assigned to the VMs.

ii) NIOC Use Case

But what if something impacts the NFS traffic flow? In this case, VM performance may be impacted not because of an over-committed datastore, but due to there not being enough network bandwidth for the ESXi host to communicate with the NFS server. For instance, as mentioned in the beginning of the post, what if a vMotion operation was initiated (an operation which could consume up to 8Gbps of the network bandwidth), and impacted the other traffic on the same pipe, such as NFS? Yes, I know a best practice from VMware is to dedicate a NIC for vMotion traffic to avoid this exact situation, but this isn't always practical on 10Gb networks. In the case where vMotion, NFS and other traffic types are sharing the same uplink, NIOC allows us to guarantee a minimum bandwidth on a per traffic type. The really cool thing is that when there is no congestion, network traffic can use *all* the available bandwidth of the uplink. And just for clarification, the uplink is actually a dvuplink since NIOC can only be enabled on distributed switches. The feature is not available on stand-alone vSwitches.

Another important point to note which sometimes causes confusion: NFS traffic on the ESX host caused by a VM's disk I/O does not count towards that VM's portgroup bandwidth allocation should NIOC kick on. These are two distinct and separate network traffic streams/types, the former being NFS and the second being VM I/O.

Conclusion

There is no reason in my opinion not to use both SIOC and NIOC together. The technologies are complementary.

References

 

Get notification of these blogs postings and more VMware Storage information by following me on Twitter: Twitter @VMwareStorage

Load Balancing with NFS and Round-Robin DNS

Those of you who have been using NFS with vSphere over the past number of years will be aware that VMware currently only supports NFS v3 over TCP. There is no multipathing with this version of NFS, and although NIC teaming can be used on the virtual switch, this is for failover purposes only.

To do some semblance of load balancing, one could mount NFS datastores via different network interfaces. For instance, NFS datastore1 could be mounted via controller1 on subnet A, and NFS datastore2 could be mounted via controller2 of the same NFS server on subnet B. This would allow you to balance the load, but is a very manual process. Could we automate this in any way?

What about using round-robin DNS where each request to resolve a Fully Qualified Domain Name (FQDN) would result in the DNS server supplying the next IP address in a list of IP addresses associated with that FQDN? Interestingly, I had this query twice last week.

First, some background on how NFS behaves in vSphere. if a user specifies the DNS name for an NFS server, we persist that DNS name in the vCenter DB. Once the datastore is instantiated on ESX, we resolve the DNS name once. So even if the datastore is temporarily unmounted and remounted (say via esxcli) we would use the same IP address. If the ESX host is restarted or if the datastore is removed and re-added later, we would resolve the FQDN again which may come back with a different IP address if the DNS Server was configured to use round-robin.

Also note that DNS resolution is done on a per datastore basis. We don't have a DNS name lookup cache in NFS that is shared between multiple mount points. Therefore different ESX hosts mounting the same NFS datastore may resolve to different IPs using round-robin. Doing mounts of different datastores using an FQDN from the same ESX server will cause each mount to resolve the FDQN and again possibly pickup a different IP using round-robin DNS configuration.

So overall, DNS round-robin should work just fine if you want to do some automated load balancing with NFS.

Get notification of these blogs postings and more VMware Storage information by following me on Twitter: Twitter @VMwareStorage