Hadoop provides a platform for building distributed systems for massive data storage and analysis. It has internal mechanisms to replicate user data and to tolerate many kinds of hardware and software failures. However, like many other distributed systems, Hadoop has a small number of Single Points of Failure (SPoFs). These include the NameNode (which manages the Hadoop Distributed Filesystem namespace and keeps track of storage blocks), and the JobTracker (which schedules jobs and the map and reduce tasks that make up each job). VMware vSphere Fault Tolerance (FT) can be used to protect virtual machines that run these vulnerable components of a Hadoop cluster.
We used a cluster of 24 hosts to run three different Hadoop applications to show that such protection has only a small impact on application performance. We employed various Hadoop configurations to artificially create greater load on the NameNode and JobTracker daemons. With conservative extrapolation, these tests show that uniprocessor virtual machines with FT enabled are sufficient to run the NameNode and JobTracker daemons for clusters of more than 200 hosts.
A new white paper, “Protecting Hadoop with VMware vSphere 5 Fault Tolerance,” is available in which these tests are described in detail. To enable comparisons with other distributed applications, we show the CPU and network utilization of the protected VMs. In addition, we provide several best practices to maximize the size of the Hadoop cluster that FT can be protect.