Virtualization Web/Tech

Introducing TPCx-HS Version 2 – An Industry Standard Benchmark for Apache Spark and Hadoop clusters deployed on premise or in the cloud

Since its release on August 2014, the TPCx-HS Hadoop benchmark has helped drive competition in the Big Data marketplace, generating 23 publications spanning 5 Hadoop distributions, 3 hardware vendors, 2 OS distributions and 1 virtualization platform. By all measures, it has proven to be a successful industry standard benchmark for Hadoop systems. However, the Big Data landscape has rapidly changed over the last 30 months. Key technologies have matured while new ones have risen to prominence in an effort to keep pace with the exponential expansion of datasets. One such technology is Apache Spark.

spark-logo-trademarkAccording to a Big Data survey published by the Taneja Group, more than half of the respondents reported actively using Spark, with a notable increase in usage over the 12 months following the survey. Clearly, Spark is an important component of any Big Data pipeline today. Interestingly, but not surprisingly, there is also a significant trend towards deploying Spark in the cloud. What is driving this adoption of Spark? Predominantly, performance.

Today, with the widespread adoption of Spark and its integration into many commercial Big Data platform offerings, I believe there needs to be a straightforward, industry standard way in which Spark performance and price/performance could be objectively measured and verified. Just like TPCx-HS Version 1 for Hadoop, the workload needs to be well understood and the metrics easily relatable to the end user.

Continuing on the Transaction Processing Performance Council’s commitment to bringing relevant benchmarks to the industry, it is my pleasure to announce TPCx-HS Version 2 for Spark and Hadoop. In keeping with important industry trends, not only does TPCx-HS support traditional on premise deployments, but also cloud.

I envision that TPCx-HS will continue to be a useful benchmark standard for customers as they evaluate Big Data deployments in terms of performance and price/performance, and for vendors in demonstrating the competitiveness of their products.

 

Tariq Magdon-Ismail

(Chair, TPCx-HS Benchmark Committee)

 

Additional Information:  TPC Press Release