posted

1 Comment

We often get questions from people who are new to big data about the reasons for virtualizing these newer infrastructures and applications. People are also interested in knowing the benefits you can gain from doing so. This video provides a set of answers to those questions. Summarizing the main points discussed in the video, the benefits you get when you decide to virtualize your big data infrastructure are:

(a) higher flexibility of management

(b) rapid provisioning of clusters through cloning

(c) freedom from dedicating groups of hardware to individual clusters

(d) performance that can equal that of native deployment and

(e) isolation of your workloads by grouping your virtual machines with resource pool boundaries.

Of course, virtualizing your big data workload also makes it ready for the cloud, as virtualization is the key technology underlying private or public clouds. The video concentrates on Spark as one example of a big data environment, but these principles apply to all distributed platforms, Hadoop and others, that support big data and analytics.

The video shows an outline architecture for a Spark-based system, which is growing very rapidly in the Hadoop market – and gives a short recipe for virtualizing that architecture.  You can learn much more about this subject by going to the VMware Big Data site

About the Author

Justin Murray

Justin Murray works as a Technical Marketing Manager at VMware and has been at the company for over six years. Justin creates technical material and gives guidance to customers and the VMware field organization to promote the virtualization of big data workloads on VMware's vSphere platform. Justin has worked closely with VMware's partner ISVs (Independent Software Vendors) to ensure their products work well on vSphere and continues to bring best practices to the field as the customer base for big data expands.