VMware performance engineers observed, under certain conditions, that ESXi IO (in versions 6.x and 7.0) with some NFS servers experienced unexpectedly low read throughput in the presence of extremely low packet loss, due to an undesirable TCP interaction between the ESXi host and the NFS server. We have published a performance case study, ESXi NFS Read Performance: TCP Interaction between Slow Start and Delayed Acknowledgement which analyzes this undesirable interaction in detail.
In this paper, we explain how this TCP interaction leads to poor ESXi NFS read performance, describe ways to determine whether this interaction is occurring in an environment, and present a workaround for ESXi 7.0 that could improve performance significantly when this interaction is detected. In our experiments with ESXi NFS read traffic from an NFS datastore, a seemingly minor 0.02% packet loss resulted in an unexpected 35% decrease in NFS read throughput.
A key lesson of this paper is that seemingly minor packet loss rates could have an outsized impact on the overall performance of ESXi networked storage. We recommend customers who are using ESXi networked storage and have highly performance-sensitive workloads to consider taking steps to identify and mitigate these undesirable interactions if appropriate.
Storage I/O Control (SIOC) allows administrators to control the amount of access virtual machines have to the I/O queues on a shared datastore. With this feature, administrators can ensure that a virtual machine running a business-critical application has a higher priority to access the I/O queue than that of other virtual machines sharing the same datastore. In vSphere 4.1, SIOC was supported on VMFS-based datastores that used SAN with iSCSI and Fibre Channel. In vSphere 5, SIOC support has been extended to NFS-based datastores.
Recent tests conducted at VMware Performance Engineering lab studied the following aspects of SIOC:
- The performance impact of SIOC: A fine-grained access management of the I/O queues resulted in a 10% improvement in the response time of the workload used for the tests.
- SIOC’s ability to isolate the performance of applications with a smaller request size: Some applications like Web and media servers use I/O patterns with a large request size (for example, 32K). But some other applications like OLTP databases request smaller I/Os ≤8K. Test findings show that SIOC helped an OLTP database workload to achieve higher performance when sharing the underlying datastore with a workload that used large-sized I/O requests.
- The intelligent prioritization of I/O resources: SIOC monitors virtual machines’ usage of the I/O queue at the host and dynamically redistributes any unutilized queue slots to those virtual machines that need them. Tests show that this process happens consistently and reliably.
For the full paper, see Performance Implications of Storage I/O Control–Enabled NFS Datastores in VMware vSphere 5