Hadoop is a modern application with features such as consolidation of jobs and HA that overlap with capabilities enabled by virtualization. This leads some to believe there is no motivation for virtualizing Hadoop; however, there are a variety of reasons for doing so. Some of these are:
- Scheduling – Taking advantage of unused capacity in existing virtual infrastructures during periods of low usage (for example, overnight) to run batch jobs.
- Resource Utilization – Co-locating Hadoop VMs and other kinds of VMs on the same hosts. This often allows better overall utilization by consolidating applications that use different kinds of resources.
- Storage Models – Although Hadoop was developed with local storage in mind, it can just as easily use shared storage for all data or a hybrid model in which temporary data is kept on local disk and HDFS is hosted on a SAN. With either of these configurations, the unused shared storage capacity and bandwidth within the virtual infrastructure can be given to Hadoop jobs.
- Datacenter Efficiency – Virtualizing Hadoop can increase datacenter efficiency by increasing the types of workloads that can be run on a virtualized infrastructure.
- Deployment – Virtualization tools ranging from simple cloning to sophisticated products like VMware vCloud Director can speed up the deployment of Hadoop nodes.
- Performance – Virtualization enables the flexible configuration of hardware resources.
Learn more: Virtualizing Business Critical Applications Whitepaper [39-page PDF]