Big Data Resources
Deployment Guides
- Virtualizing Hadoop – a Deployment Guide
- Deploying Virtualized Cloudera CDH on vSphere using Isilon Storage – Technical Guide from EMC/Isilon. Find the latest version at https://community.emc.com/docs/DOC-26892
- Deploying Virtualized Hortonworks HDP on vSphere using Isilon Storage – Technical Guide from EMC/Isilon. Find the latest version at https://community.emc.com/docs/DOC-26892
- How to Enable Compute Accelerators on vSphere 6.5 for Machine Learning and Other HPC Workloads
Reference Architectures
- Cloudera Reference Architecture – Isilon version
- Cloudera Reference Architecture – Direct Attached Storage version
- Big Data with Cisco UCS and EMC Isilon: Building a 60 Node Hadoop Cluster (using Cloudera)
- Deploying Hortonworks Data Platform (HDP) on VMware vSphere – Technical Reference Architecture
- Scaling the Deployment of Multiple Hadoop Workloads on a Virtualized Infrastructure (Intel, Dell and VMware):
Case Studies
- Adobe Deploys Hadoop-as-a-Service on VMware vSphere
- Virtualizing Hadoop in Large-Scale Infrastructures – technical white paper by Dell-EMC
- Skyscape Cloud Services Deploys Hadoop in the Cloud on VMware vSphere
- Interview with Ajay Sabhlok of VMware IT on Deploying Big Data
- Virtualizing Big Data at VMware IT – Starting Out at Small Scale
Performance
- Big Data Performance on VMware Cloud on AWS: Spark Machine Learning and IoT Analytics Performance On-premises and in the Cloud
- Fast Virtualized Hadoop and Spark on All-Flash Disks – Best Practices for Optimizing Virtualized Big Data Applications on VMware vSphere 6.5 (2017)
- Big Data Performance on vSphere 6 – Best Practices for Optimizing Virtualized Big Data Applications (2016)
- Virtualized Hadoop Performance with VMware vSphere 6 on High-Performance Servers (2015)
- Virtualized Hadoop Performance with VMware vSphere 5.1 (2013)
- A Benchmarking Case Study of Virtualized Hadoop Performance on vSphere 5 (2011)
- The Transaction Processing Council – TPCx-HS Benchmark Results (Cloudera on VMware performance, submitted by Dell)
- ESG Lab Review: VCE vBlock Systems with EMC Isilon for Enterprise Hadoop
- Intel and VMware White Paper : New Era of Hyper-Converged Big Data Using Hadoop with All-Flash VMware VSAN
Big Data on VMware Cloud on AWS – Blog Articles
- Using an AWS VPC Endpoint for Access to Data in S3 from Spark on VMware Cloud on AWS
- Using S3 Data with Cloudera on VMware Cloud on AWS
- Running Apache Spark for Big Data on VMware Cloud on AWS – Part 1
- Spark for Big Data on VMware Cloud on AWS – Part 2 : A Proof-of-Concept Design and Testing
Other Big Data on vSphere Papers
- Protecting Hadoop with VMware vSphere 5 Fault Tolerance
- Toward an Elastic Elephant – Enabling Hadoop for the Cloud
- Hadoop Virtualization Extensions (HVE)
- Apache Flume and Apache Scoop Data Ingestion to Apache Hadoop Clusters on VMware vSphere
Big Data – Demo Recordings Available on YouTube
- Running Spark Standalone on VMware Cloud on AWS
- Big Data and Spark on VMware Cloud on AWS
- Standalone Spark on On-Premises vSphere with a Machine Learning Test Run
- The Big Data Playlist – a Set of Demo Recordings on YouTube
- VMware CTO and Cloudera CTO Interview videos and Adobe Reference video
- Virtualizing Big Data: Real-World Customer Architectures – presentation at Strata Data New York 2017
Books
- Virtualizing Hadoop, by Trujillo et al., published by VMware Press, ISBN: 978-0-13-381102-5