Architecture

Virtualized Big Data Faster Than Bare Metal

VMware is excited to highlight a new partner benchmark effort in the Big Data technology space.

Dell recently published two TPCx-HS results that demonstrate that Big Data technology on vSphere is actually ‘faster’ than bare metal.

Let’s take a look…

Many organizations and their operations teams are still not confident that big data technologies should be virtualized. But in fact, VMware has previously demonstrated the benefits of both automating and the performance of these workloads. By calling out this 3rd party, audited benchmark, we hope to increase customer confidence that vSphere is the best platform for all your big data applications.

Let’s take a more detailed look at this benchmark.

Who is the Transaction Processing Performance Council (www.tpc.org)?

The TPC is a non-profit corporation founded to define transaction processing and database benchmarks and to disseminate objective, verifiable TPC performance data to the industry. Its membership consists of:

tpcmembers2015

What is TPCx-HS?

“The modeled application is simple and the results are highly relevant to hardware and software dealing with Big Data systems in general. The TPCx-HS stresses both hardware and software including Hadoop run-time, Hadoop Filesystem API compatible systems and MapReduce layers. This workload can be used to asses a broad range of system topologies and implementation of Hadoop clusters.” [source]

Why TPCx-HS?

“The TPCx-HS can be used to asses a broad range of system topologies and implementation methodologies in a technically rigorous and directly comparable, in a vendor-neutral manner.” [source]

Why is VMware excited about this Big Data technology benchmark?

  • Big Data technologies constitute a growing application set within our customer portfolios.
  • Big Data technologies are perceived to suffer from high virtualization overhead and poor performance so a benchmark can help expose that if it does exist.
  • The benchmark results are audited by certified TPC experts thereby ensuring transparency, fairness and integrity.

Full details on the TPCx-HS benchmark are available here: Benchmark Document

The Configuration Details:

The test bed consisted of the following configuration:

Virtual Machine Workloads

  • 128x Cloudera CDH 5.3.0 virtual machines
  • 10 vCPU, 60GB RAM each
  • SUSE SLES 11 SP3

Hosting Infrastructure

  • Dell PowerEdge R720xd Servers
  • Intel Xeon E5-2680v2 – 2.8Ghz, 256GB RAM
  • Local DAS
  • VMware vSphere 6

bdconfig

The Result:

The TPCx-HS benchmark scores are the HSph (Hadoop Sorts per Hour) at a specific scale factor, in this case 30 TB and HSph/$ (Price/Performance). The two results are shown together on the TPC site for convenient comparison. Both Dell results utilized the exact same hardware and software platforms, with the only difference being the use of vSphere 6 for the most performant result.

tpcx-hs-30tb-results

reference: Results Page

Placing the big data application tier on vSphere 6, with everything else being equal, yielded an 8% performance benefit over bare metal. Additionally the Price/Performance of the virtualized environment is on par with the bare metal environment. So why wouldn’t you want to virtualize your Big Data technology?

VMware has also done some internal benchmarking using a like system and those results can be found here. This VMware whitepaper provides more details into the configuration, additional benchmark data and recommended practices.

The Takeaways:

  • Dell published some of the world’s 1st TPCx-HS benchmark results.
  • Compliant and audited by a 3rd party.
  • Confidence you can virtualize big data technology today without performance penalties and in fact performance gains.
  • VMware continues produce the best platform for all your enterprise applications.