Product Announcements

NFS Block Sizes, Transfer Sizes & Locking

Posted by Cormac Hogan
Technical Marketing Architect (Storage)

I've had a few questions recently around the I/O characteristics of VMware's NFS implementation. I'm going to use this post to answer the common ones.


NFS Block Sizes

 The first of these questions is usually around the block size used by NFS. The block size on NFS datastores is "only" based on the block size of the native filesystem on the NFS server or NAS array, so the size depends solely on the underlying storage architecture of the server or the array.

The block size has no dependancy on the Guest Operating System block size (which is a common misconception) because the Guest OS's virtual disk (VMDK) is only a flat file that is created on the server/array. This file is subject to the block sizes enforced on the NFS server's or NAS array's filesystem.

One more interesting piece of detail is that when there is a fsstat done on the NFS mount on the ESXi client, the ESXi NFS client always returns the default file block size as 4096. Here is an example of this using the vmkfstools command to look at the file block size:

Vmkfstools - 4k bs

Maximum Transfer Sizes

The NFS datastore's block sizes is different from maximum read and write transfer sizes. The maximum read and write transfer sizes are the chunks in which the client communicates with the server. A typical NFS server could advertize 64KB as the maximum transfer size for reads and writes. In this case, a 1MB read would be broken down into a 16 x 64KB sized reads. However, the point is that this has got nothing to do with the block sizes of the NFS datastore on the NFS server/NAS array.


NFS (Version 3) Locking

Another common question I get is around NFS locking. In NFS v3, which is the version of NFS still used by vSphere, the client is responsible for all locking activities such as liveliness and enforcement. The client must 'heartbeat' the lock on a periodic basis to maintain the lock. The client must also verify the lock status before issuing each I/O to the file that is protected by that lock. The client which holds the lock must periodically update the timestamp stored in the lock file to ensure lock liveliness. If another client wishes to lock the file, it monitors the lock liveliness by polling the timestamp. If the timestamp is not updated during a specific window of time (discussed later), the client which holds the lock is presumed dead and the competing client may break the lock.

To ensure consistency, I/O is only issued to the file when the client is the lock holder and the lock lease has not expired yet. By default, there are 3 heartbeat attempts at 10 seconds intervals and each heartbeat has a 5 seconds timeout. In the worst case, when the last heartbeat attempt times out, it will take 3 * 10 + 5 = 35 seconds before the lock is marked expired on the lock holder client. Before the lock is marked expired, I/O will continue to be issued, even after failed heartbeat attempts.

Lock preemption on a competing client starts from the detection of lock conflict. It then takes 3 polling attempts with 10 seconds intervals for the competing host to declare that the lock has expired and break it. It then takes another 10 seconds to establish its own lock. Lock preemption will be completed in 3 * 10 + 10 = 40 seconds before I/O will start to flow on the competing host.


Get notification of these blogs postings and more VMware Storage information by following me on Twitter: Twitter @VMwareStorage