I was recently involved in some discussions about how Fault Tolerance would behave on the vSphere Storage Appliance.

The crux of the matter was what would happen if a host in the vSphere Storage Appliance (VSA) suffered a failure. Those of you who are familiar with the VSA will be aware that the VSA takes the local storage from an ESXi host and presents it as a mirrored NFS datastore. Therefore both compute and storage are on the same host. In the event of a host failure, another VSA node (ESXi host) in the cluster takes over the role of presenting the shared storage, and promotes its mirrored copy of the NFS datastore from secondary to primary.

In the case of an FT VM, the VM also has a primary and secondary instance. However, there is still only a single set of files backing the FT VM. Now take the example of an FT VM deployed on ESXi host (A), and residing on the shared storage provided by the VSA also residing on ESXi host (A). In this case, if ESXi host (A) failed, not only does the primary FT VM fail, but the datastore holding its files is no longer available (a double failure if you will). This is in essence a host and storage failure for the FT VM, so the secondary instance of the FT VM would not be able to kick-in immediately as it would have lost access to the underlying files (albeit temporarily).

But there is a work around if you plan carefully.

The objective is to place the primary FT VM on a different ESXi host to its storage, i.e. the primary FT VM runs on ESXi host (A) but it is using the mirrored shared storage from ESXi host (B). Now if ESXi host (A) goes down, the secondary FT VM can kick in because the storage is unaffected by this failure. And conversely, if ESXi host (B) goes down, it is only the storage that is affected, and the mirror copy of the storage comes online in just a few seconds on ESXi host (A). 

Obviously, you would need to put the configuration back to its original settings when the outage has been rectified on the ESXi host, but using careful planning on both two node and three node VSA clusters will avoid a host/storage outage impacting your FT VM.

I would also highlight the fact that this is not a scenario unique to the VMware VSA – this is a consideration that one would have to take into account with any storage appliance software running on an ESXi host that wishes to implement VMware Fault Tolerance.

On a 3 node VSA, you could take this one step further and place the primary FT VM on one host, the secondary FT VM on a second host, and use the shared storage from the third host for that FT VM.

Get notification of these blogs postings and more VMware Storage information by following me on Twitter: Twitter @VMwareStorage

About the Author

Cormac Hogan

Cormac Hogan is a Senior Staff Engineer in the Office of the CTO in the Storage and Availability Business Unit (SABU) at VMware. He has been with VMware since April 2005 and has previously held roles in VMware’s Technical Marketing and Technical Support organizations. He has written a number of storage related white papers and have given numerous presentations on storage best practices and vSphere storage features. He is also the co-author of the “Essential Virtual SAN” book published by VMware Press.