Deployment Guides

  1. Virtualizing Hadoop – a Deployment Guide
  2. Deploying Virtualized Cloudera CDH on vSphere using Isilon Storage – Technical Guide from EMC/Isilon.  Find the latest version at https://community.emc.com/docs/DOC-26892
  3. Deploying Virtualized Hortonworks HDP on vSphere using Isilon Storage – Technical Guide from EMC/Isilon.  Find the latest version at https://community.emc.com/docs/DOC-26892
  4. How to Enable Compute Accelerators on vSphere 6.5 for Machine Learning and Other HPC Workloads

Reference Architectures

  1. Cloudera Reference Architecture – Isilon version
  2. Cloudera Reference Architecture – Direct Attached Storage version
  3. Big Data with Cisco UCS and EMC Isilon: Building a 60 Node Hadoop Cluster (using Cloudera)
  4. Deploying Hortonworks Data Platform (HDP) on VMware vSphere – Technical Reference Architecture
  5. Scaling the Deployment of Multiple Hadoop Workloads on a Virtualized Infrastructure (Intel, Dell and VMware): 

Case Studies

  1. Adobe Deploys Hadoop-as-a-Service on VMware vSphere
  2. Virtualizing Hadoop in Large-Scale Infrastructures – technical white paper by Dell-EMC
  3. Skyscape Cloud Services Deploys Hadoop in the Cloud on VMware vSphere
  4. Interview with Ajay Sabhlok of VMware IT on Deploying Big Data
  5. Virtualizing Big Data at VMware IT – Starting Out at Small Scale

Performance

  1. Big Data Performance on VMware Cloud on AWS: Spark Machine Learning and IoT Analytics Performance On-premises and in the Cloud
  2. Fast Virtualized Hadoop and Spark on All-Flash Disks – Best Practices for Optimizing Virtualized Big Data Applications on VMware vSphere 6.5 (2017)
  3. Big Data Performance on vSphere 6 – Best Practices for Optimizing Virtualized Big Data Applications (2016)
  4. Virtualized Hadoop Performance with VMware vSphere 6 on High-Performance Servers (2015)
  5. Virtualized Hadoop Performance with VMware vSphere 5.1 (2013)
  6. A Benchmarking Case Study of Virtualized Hadoop Performance on vSphere 5 (2011)
  7. The Transaction Processing Council – TPCx-HS Benchmark Results (Cloudera on VMware performance, submitted by Dell)
  8. ESG Lab Review: VCE vBlock Systems with EMC Isilon for Enterprise Hadoop
  9. Intel and VMware White Paper : New Era of Hyper-Converged Big Data Using Hadoop with All-Flash VMware VSAN

Big Data on VMware Cloud on AWS – Blog Articles

  1. Using an AWS VPC Endpoint for Access to Data in S3 from Spark on VMware Cloud on AWS
  2. Using S3 Data with Cloudera on VMware Cloud on AWS
  3. Running Apache Spark for Big Data on VMware Cloud on AWS – Part 1
  4. Spark for Big Data on VMware Cloud on AWS – Part 2 : A Proof-of-Concept Design and Testing

Other Big Data on vSphere Papers

  1. Protecting Hadoop with VMware vSphere 5 Fault Tolerance
  2. Toward an Elastic Elephant – Enabling Hadoop for the Cloud
  3. Hadoop Virtualization Extensions (HVE)
  4. Apache Flume and Apache Scoop Data Ingestion to Apache Hadoop Clusters on VMware vSphere

Big Data – Demo Recordings Available on YouTube

  1. Running Spark Standalone on VMware Cloud on AWS
  2. Big Data and Spark on VMware Cloud on AWS
  3. Standalone Spark on On-Premises vSphere with a Machine Learning Test Run
  4. The Big Data Playlist – a Set of Demo Recordings on YouTube
  5. VMware CTO and Cloudera CTO Interview videos and Adobe Reference video
  6. Virtualizing Big Data: Real-World Customer Architectures – presentation at Strata Data New York 2017

Books 

  1. Virtualizing Hadoop, by Trujillo et al., published by VMware Press,  ISBN: 978-0-13-381102-5