This article describes a set of performance tests that were conducted on HBase, a popular data management tool that is frequently used with Hadoop, running on VMware vSphere 6 and provisioned by the vSphere Big Data Extensions tool. The work described here was done by Xinhui Li, who is a staff engineer in the Big Data team in VMware’s R&D Labs in Beijing. Xinhui’s biography and background details are given at the end of the article.

What is HBase?

HBase is an Apache project that is designed to handle very large amounts of data on the Hadoop platform. HBase is often described as providing the functionality of a NoSQL database running on top of Hadoop. It combines the scalability of Hadoop, through its use of the Hadoop Distributed File System (HDFS) to store the data, with real-time data access to the data. HBase can handle billions of rows of data and very large numbers of columns. Along with Hadoop, HBase runs on clusters of commodity hardware that form a distributed system. The HBase architecture is made up of RegionServers that run on the worker nodes while the HBase Master Server controls them.

Benefits of Running HBase on vSphere

Virtualization brings several benefits to deploying HBase that are related to resource consumption and sharing in a diverse environment. There are many good reasons for virtualizing a HBase cluster, including:

Enterprises often deploy multiple HBase clusters in parallel, not just one, such as for testing and scaling before going into production. Virtualization allows the enterprise to do that on a common pool of hardware, thereby saving costs.
Different versions of HBase and Hadoop can co-exist in the same environment over periods of time and can be managed and isolated from each other more easily when virtualized.
Sizing an HBase/Hadoop cluster is not an exact science, with iterations around the size needed to adjust for performance and reliability. For those reasons, users need to be able to test various configurations before finding the best fit for an application’s needs
Workloads can be of a mixed nature, so traditional IT Workloads may be mixed with HBase or other Hadoop workloads on common pools of hardware.
A virtual infrastructure is more elastic and can more easily accommodate changing resource needs of each workload.
Datacenter and Hadoop administration roles can be more easily separated.
Separate virtual clusters enable better security isolation in a multi-tenant environment.

Performance of Virtualized HBase

With above benefits in mind, an apples-to-apples set of tests was conducted at VMware on the performance of a virtualized HBase cluster compared to a native deployment. All hosts in the test bed were configured to be freely switchable between native Centos 7 and ESX 6.0, which kept the comparison fair based on the use of the same hardware.

Hardware Description

One host was configured to hold the master/client roles; 3 hosts were used to host the slave roles.

PowerEdge 720 Dell, 12 CPUs 2.499GHz, HT enabled

80GB RAM per host

10G network for data transfer

H710i adaptor for local disks

6 SATA 7200 disks as data disks, shared between native and virtual cluster

Virtual Hardware

One virtual machine per host server

Virtual hardware: version 10

VMware Tools: installed

Each virtual machine had 22 vCPUs, 74G Memory

6 VMDK as data disks, each from one Datastore mapped to the single aligned partition created on every physical disk

The virtual machine latency sensitive setting was enabled and set to high.

Software

HBase 0.98.6

vSphere 6.0

Guest and Native OS: Centos 7

BDE: pre-release version post v2.2

Figure 1 shows the results of the benchmark test conducted. The test suite used was the Yahoo Cloud Serving Benchmark [1] with a record count of 300 million, which is more than the performance test bed could cache entirely in its total RAM. This test suite was chosen for its most suitability to simulate real world mixed random read/write Database operations.

Figure 1. YCSB (record count = 300000000) Results on Both Native deployment and Virtual cluster

The test duration time on the virtualized HBase cluster shows a completion time of slightly less than 7% longer than the equivalent native deployment. This result was shown when executed with two types of representative read/write mixed operations. The vSphere Big Data Extensions was used to auto-provision the virtualized HBase cluster and it applied the known best practices to the cluster creation. Because of this and the support in BDE (post v2.2) for enabling latency-sensitive optimization for the virtual machines, there was no need for any other tuning to achieve this performance, thus simplifying the setup.

Steps to Achieve HBase Performance

HBase runs well with a vSphere setup that is out of the box with no special tuning steps applied. There are certain straightforward steps that can be taken to give HBase an extra boost towards favorable performance on vSphere.

The VMware BDE tool, used for provisioning distributed application environments such as Hadoop and HBase, embeds a set of best practices to achieve performance optimization with HBase.

During the provisioning of an HBase cluster, BDE configures various settings in the virtual machine configuration and the guest OS to achieve performance efficiency. These settings can be classified into three categories:

(1) HBase’s read/write operations are extremely latency-sensitive. The first category of settings enables the underlying mechanisms of vSphere to yield significantly lower latencies and jitter [5].

(2) The second category of settings is related to network optimization. Besides automatically installing vmnext3 as the virtual NIC type, BDE can help to enable Jumbo frames and the single-queue mechanism of vmxnet3 to minimize the latency. By default, vmxnet3 enables multiqueue for both transmit and receive traffic. Multiqueue is a technique designed to enhance networking performance by allowing the Tx and Rx queues in the vmxnet3 driver to scale with the number of CPUs in multi-processor systems. Many physical network adapters support multiqueue, and if you have this technology enabled on your NIC, Linux virtual machines running on vSphere 5 can take advantage of it. Multi-queue is mainly useful for handling very high rates of small packets [6]. HBase does not <<USE>> packet rates this high and thus single queue networking improves the network efficiency.

(3) HBase depends on significantly large amounts (often over 64GB) of memory being available to provide indexing and read/write operations. The HBase RegionServer service often runs inside a Java virtual machine process with large memory and it is therefore possible that it would experience high latency caused by the behavior of the Java Garbage Collector (GC). BDE calculates the optimal heap size for the RegionServer service from available memory [7] and provides the ideal JVM parameters to achieve efficient GC behavior.

Architecture of HBase on vSphere

Figure 2 shows the most straigthforward deployment architecture of the slave node or RegionServer processes for a virtualized HBase setup. A single virtual machine, containing the RegionServer and DataNode service, runs on each host server as one slave node. HBase depends on HDFS as its backend storage.

In HDFS, the read operations normally go through the DataNode. So-called “short-circuit” [2] read operations bypass the DataNode, allowing the client process to read the file directly. Short-circuit is enabled in the configuration of HDFS allowing the RegionServer process to read the file’s blocks directly where the required data exists locally on that node. This can benefit HBase performance significantly [3].

If a situation arises where the data is not co-located on the same node/virtual machine as the requesting RegionServer, then the DataNode reads the appropriate HDFS blocks from the disk and sends the data to the requesting process over a TCP socket. For efficient throughput, 6 disks are attached per node and configured into HDFS. Reference [4] includes more details about how to make use of DAS for HDFS to support Hadoop on vSphere.

Fig2-Deployment-Structure

Figure 2. Deployment outline for an apples-to-apples comparison – single slave VM per host

With the deployment structure shown in Figure 2, all the CPU, RAM, and storage resources of each host are given to the single VM that is running on it. In the case of the tests done here, CentOS 7 was the Guest OS running in the virtual machines. The apples-to-apples comparison test adopted this method of deployment to analyze the differences with a native deployment. This deployment is also the most RAM efficient way because almost all the RAM resources can be used for RegionServer services, except for the ESX overhead (6% usually in total) and DataNode service usage for HDFS functions (2G usually).

As mentioned earlier, BDE embeds the best practices for HBase performance on vSphere and can automatically handle the optimal settings for virtual machines and their guest operating systems. Before deploying with BDE, it is recommended that the active power management feature in the BIOS settings should be disabled – if this does not conflict with other requirements. This is done because any form of power management adds latency to the path where an idle system responds to an external event for the latency-sensitive HBase. There are no extra tuning steps needed other than this.

Summary

In summary, vSphere provides a robust and flexible platform for hosting an HBase deployment in virtual machines with performance that is comparable to that of a native deployment. Significant benefits can be gained from the virtualization of HBase such as the ability to bring up and down a cluster containing HBase at will and to expand and contract an existing HBase cluster, while sharing resources with other workloads.

BDE provides an easier way to deploy HBase on vSphere with good performance practices applied to the deployment. With its flexible deployment structure and scale-out support, users can conveniently provision and use HBase clusters on the vSphere platform.

References

Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R, Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud Computing, pp. 143–154. ACM (2010)
Explanation of the short circuit feature: http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html
Performance effects of short circuit: http://blog.cloudera.com/blog/2013/08/how-improved-short-circuit-local-reads-bring-better-performance-and-security-to-hadoop/
Virtualizing Hadoop – a Deployment Guide, http://www.vmware.com/files/pdf/products/vsphere/Hadoop-Deployment-Guide-USLET.pdf
Jin Heo and Lenin Singaravelu, Deploying Extremely Latency-Sensitive Applications in VMware vSphere 5.5, http://www.vmware.com/files/pdf/techpaper/latency-sensitive-perf-vsphere55.pdf.
Jeff Buell, Virtualized Hadoop Performance with VMware vSphere on HighPerformance Servers
Emad Benjamin, Virtualizing and Tuning Large Scale Java Platforms, Jan 6, 2014, VMWare Press.

Author Bio

Xinhui Li is a Staff Engineer on the VMware Big Data team. Her work focuses on the design and optimization of distributed systems, including resource management/scheduling and high availability. She works on enabling cooperation of new virtualization technologies with the primary Big Data stack components. She has 13 international patents. Xinhui has published 3 industry technical documents, and 5 papers on international PIC target academy journals and conferences.