In an earlier paper on a small seven-host cluster it was shown that Hadoop can be virtualized with little overhead, and that better-than-native performance can be achieved with the right configuration. However, the reasons for the observed performance behavior were not well understood. Recently, this work was refreshed with a larger cluster of 32 high-performance hosts running VMware vSphere® 5.1. The performance of native and several virtual configurations was compared for three applications. The apples-to-apples case of a single virtual machine per host shows performance close to that of native. Improvements in elapsed time of up to 13% for the most important application (TeraSort) can be achieved by partitioning each host into two or four virtual machines, resulting in competitive or even better than native performance as shown in the figure below (number of VMs is per host, and a lower ratio is better). Details of the results are in a new whitepaper: “Virtualized Hadoop Performance with VMware vSphere 5.1“. The paper also discusses the use of several performance tools and models to gain a better understanding of both the sources of virtualization overhead and the reasons why configuring multiple smaller virtual machines per host can enhance performance. Based on this, recommendations for optimal hardware and software configuration are also given.
One comment has been added so far