VMware performance engineers observed, under certain conditions, that ESXi IO (in versions 6.x and 7.0) with some NFS servers experienced unexpectedly low read throughput in the presence of extremely low packet loss, due to an undesirable TCP interaction between the ESXi host and the NFS server. We have published a performance case study, ESXi NFS Read Performance: TCP Interaction between Slow Start and Delayed Acknowledgement which analyzes this undesirable interaction in detail.
In this paper, we explain how this TCP interaction leads to poor ESXi NFS read performance, describe ways to determine whether this interaction is occurring in an environment, and present a workaround for ESXi 7.0 that could improve performance significantly when this interaction is detected. In our experiments with ESXi NFS read traffic from an NFS datastore, a seemingly minor 0.02% packet loss resulted in an unexpected 35% decrease in NFS read throughput.
A key lesson of this paper is that seemingly minor packet loss rates could have an outsized impact on the overall performance of ESXi networked storage. We recommend customers who are using ESXi networked storage and have highly performance-sensitive workloads to consider taking steps to identify and mitigate these undesirable interactions if appropriate.
By Mark Ma
With the release of vSphere 6.7, VMware added iSER (iSCSI Extensions for RDMA) as a native supported storage protocol to ESXi. With iSER run over iSCSI, users can boost their vSphere performance just by replacing the regular NICs with RDMA-capable NICs. RDMA (Remote Direct Memory Access) allows the transfer of memory from one computer to another. This is a direct transfer and minimizes CPU/kernel involvement. By bypassing the kernel, we get extremely high I/O bandwidth and low latency. (To use RDMA, you must have an HCA/Host Channel Adapter device on both the source and destination.) In this blog, we compare standard iSCSI performance vs. iSER performance to see how iSER can release the full potential of your iSCSI storage.
We compared the I/O performance of vSphere 6.0 U2 over 16Gb and 32Gb Emulex FC HBAs connected via a Brocade G620 FC switch to an EMC VNX7500 storage array.
Iometer, a common microbenchmark, was used to generate the workload for various block sizes. For single-VM experiments, we measured sequential read and sequential write throughput. For multi-VM experiments, we measured random read IOPS and throughput.
Our experiments showed that vSphere 6 can achieve near line rate with 32Gb FC.
For details, please see the whitepaper Storage I/O Performance on VMware vSphere 6.0 U2 over 32 Gigabit Fibre Channel.