VMware performance engineers observed, under certain conditions, that ESXi IO (in versions 6.x and 7.0) with some NFS servers experienced unexpectedly low read throughput in the presence of extremely low packet loss, due to an undesirable TCP interaction between the ESXi host and the NFS server. We have published a performance case study, ESXi NFS Read Performance: TCP Interaction between Slow Start and Delayed Acknowledgement which analyzes this undesirable interaction in detail.
In this paper, we explain how this TCP interaction leads to poor ESXi NFS read performance, describe ways to determine whether this interaction is occurring in an environment, and present a workaround for ESXi 7.0 that could improve performance significantly when this interaction is detected. In our experiments with ESXi NFS read traffic from an NFS datastore, a seemingly minor 0.02% packet loss resulted in an unexpected 35% decrease in NFS read throughput.
A key lesson of this paper is that seemingly minor packet loss rates could have an outsized impact on the overall performance of ESXi networked storage. We recommend customers who are using ESXi networked storage and have highly performance-sensitive workloads to consider taking steps to identify and mitigate these undesirable interactions if appropriate.