Home > Blogs > VMware VROOM! Blog > Author Archives: Dave Jaffe

Author Archives: Dave Jaffe

New White Paper: Fast Virtualized Hadoop and Spark on All-Flash Disks – Best Practices for Optimizing Virtualized Big Data Applications on VMware vSphere 6.5

A new white paper is available showing how to best deploy and configure vSphere 6.5 for Big Data applications such as Hadoop and Spark running on a cluster with fast processors, large memory, and all-flash storage (Non-Volatile Memory Express storage and solid state disks). Hardware, software, and vSphere configuration parameters are documented, as well as tuning parameters for the operating system, Hadoop, and Spark.

The best practices were tested on a 13-server cluster, with Hadoop installed on vSphere as well as on bare metal. Workloads for both Hadoop (TeraSort and TestDFSIO) and Spark Machine Learning Library routines (K-means clustering, Logistic Regression classification, and Random Forest decision trees) were run on the cluster. Configurations with 1, 2, and 4 VMs per host were tested as well as bare metal. Among the 4 virtualized configurations, 4 VMs per host ran fastest due to the best utilization of storage as well as the highest percentage of data transfer within a server. The 4 VMs per host configuration also ran faster than bare metal on all Hadoop and Spark tests but one.

Continue reading

New White Paper: Best Practices for Optimizing Big Data Performance on vSphere 6

A new white paper is available showing how to best deploy and configure vSphere for Big Data applications such as Hadoop and Spark. Hardware, software, and vSphere configuration parameters are documented, as well as tuning parameters for the operating system, Hadoop, and Spark.

The best practices were tested on a Dell 12-server cluster, with Hadoop installed on vSphere as well as on bare metal. Workloads for both Hadoop (TeraSort and TestDFSIO) and Spark (Support Vector Machines and Logistic Regression) were run on the cluster. The virtualized cluster outperformed the bare metal cluster by 5-10% for all MapReduce and Spark workloads with the exception of one Spark workload, which ran at parity. All workloads showed excellent scaling from 5 to 10 worker servers and from smaller to larger dataset sizes.

Continue reading