In earlier releases of ESXi, a VMkernel interface could transport three types of traffic: Management, vMotion and Fault Tolerance. To enable a particular traffic type, one would use either the vSphere Web/C# Client or the vSphere API. Some of you may have recalled using an undocumented command-line tool called vim-cmd in the ESXi Shell to enable vMotion and Fault Tolerance traffic. An issue with this tool is it does not support the Management traffic type. This made it a challenge to completely automate the provisioning of VMkernel interfaces from a scripted installation (kickstart) perspective and required the use of remote CLI/APIs to enable the Management traffic type.
Tag Archives: Fault Tolerance
I was recently involved in some discussions about how Fault Tolerance would behave on the vSphere Storage Appliance.
The crux of the matter was what would happen if a host in the vSphere Storage Appliance (VSA) suffered a failure. Those of you who are familiar with the VSA will be aware that the VSA takes the local storage from an ESXi host and presents it as a mirrored NFS datastore. Therefore both compute and storage are on the same host. In the event of a host failure, another VSA node (ESXi host) in the cluster takes over the role of presenting the shared storage, and promotes its mirrored copy of the NFS datastore from secondary to primary.
In the case of an FT VM, the VM also has a primary and secondary instance. However, there is still only a single set of files backing the FT VM. Now take the example of an FT VM deployed on ESXi host (A), and residing on the shared storage provided by the VSA also residing on ESXi host (A). In this case, if ESXi host (A) failed, not only does the primary FT VM fail, but the datastore holding its files is no longer available (a double failure if you will). This is in essence a host and storage failure for the FT VM, so the secondary instance of the FT VM would not be able to kick-in immediately as it would have lost access to the underlying files (albeit temporarily).
But there is a work around if you plan carefully.