Hadoop provides a platform for building distributed systems for massive data storage and analysis. It has internal mechanisms to replicate user data and to tolerate many kinds of hardware and software failures. However, like many other distributed systems, Hadoop has a small number of Single Points of Failure (SPoFs). These include the NameNode (which manages the Hadoop Distributed Filesystem namespace and keeps track of storage blocks), and the JobTracker (which schedules jobs and the map and reduce tasks that make up each job). VMware vSphere Fault Tolerance (FT) can be used to protect virtual machines that run these vulnerable components of a Hadoop cluster. Recently a cluster of 24 hosts was used to run three different Hadoop applications to show that such protection has only a small impact on application performance. Various Hadoop configurations were employed to artificially create greater load on the NameNode and JobTracker daemons. With conservative extrapolation, these tests show that uniprocessor virtual machines with FT enabled are sufficient to run the master daemons for clusters of more than 200 hosts.
A new white paper, "Protecting Hadoop with VMware vSphere 5 Fault Tolerance," is now available in which these tests are described in detail. CPU and network utilization of the protected VMs are given to enable comparisons with other distributed applications. In addition, several best practices are suggested to maximize the size of the Hadoop cluster that can be protected with FT.