We often get questions from people who are new to big data about the reasons for virtualizing these newer infrastructures and applications. People are also interested in knowing the benefits you can gain from doing so. This video provides a set of answers to those questions. Summarizing the main points discussed in the video, the benefits you get when you decide to virtualize your big data infrastructure are:
(a) higher flexibility of management
(b) rapid provisioning of clusters through cloning
(c) freedom from dedicating groups of hardware to individual clusters
(d) performance that can equal that of native deployment and
(e) isolation of your workloads by grouping your virtual machines with resource pool boundaries.
Of course, virtualizing your big data workload also makes it ready for the cloud, as virtualization is the key technology underlying private or public clouds. The video concentrates on Spark as one example of a big data environment, but these principles apply to all distributed platforms, Hadoop and others, that support big data and analytics.
The video shows an outline architecture for a Spark-based system, which is growing very rapidly in the Hadoop market – and gives a short recipe for virtualizing that architecture. You can learn much more about this subject by going to the VMware Big Data site