Closeup of business woman hand holding chess king piece
vSAN

Busting Big Data’s Bare Metal Myth

Post originally appeared on Radius


Everyone responsible for the health and growth of a business should read about the value of Big Data. In manufacturing, it improves quality; in sales and marketing, it boosts the size of the shopping basket; in logistics, it cuts costs. These are realities. But there are also Big Data myths. The most pernicious of these is the widely held belief that Big Data requires bare metal installation on special, high-performance servers.

Nothing could be further from the truth. But, like most myths, the dedicated server myth had some basis in reality at one time in the past.

Getting useful business results from Big Data in a timely fashion, which often means in real time, is highly dependent on having the right infrastructure in place to support the workload. In the past, dedicated servers really were a requirement. But in 2005, a new computing platform emerged that changed everything: Hadoop. This enables large database workloads to be distributed over multiple servers—thousands, if necessary.

An Alternate to Bare Metal

With Hadoop, bare metal installations—where the Big Data application runs directly on a dedicated server or cluster of servers—are certainly a viable option. However, Hadoop servers can also be virtualized. This means the same collection of hardware that would have been dedicated to a native implementation now has the potential to be shared across more than one Hadoop cluster, with different versions and distributions operating simultaneously. This is impossible if done with a physical, rather than a virtual, Hadoop implementation.

Today, Big Data workloads can be treated just like any other workloads. In fact, they are just like any other workloads, which means they can run well on a virtualized platform within the data center. This is important because the virtualized approach to Big Data has significant advantages, three in particular, over dedicated Big Data clusters.

  1. Efficiency. In a virtualized data center, the utilization of every physical server can be maximized. This is not true with dedicated Big Data clusters. When they are not processing specific Big Data workloads, they sit idle. In contrast, when Big Data is virtualized, idle resources can be put to work.
  2. Flexibility. There are multiple distributions, or versions, of Hadoop. Each distribution has its strengths, and choosing which one is best for any given initiative is an important decision. The beauty of virtualizing Hadoop is that it enables multiple Hadoop distributions to run side-by-side on the same nodes or clusters. Switching or adding additional distributions is made simple because of the ability to clone or deploy virtual machines from templates.
  3. Learning Curve. New hardware means new tools and procedures. When organizations make use of virtual infrastructure, there is no need to add and support extra management tools, and no learning curve to deal with. Most organizations already utilize virtualization, and IT teams already have the know-how to manage a virtual environment. The same knowledge can be applied to virtualized Big Data environments as well.

Is Hadoop for Us?

In spite of these benefits, many important questions linger for organizations considering virtualized Hadoop deployments.

  • Will Hadoop really perform as well in a virtualized environment as it will on bare metal? Yes. Precise performance results will of course depend on the configuration details and the nature of the workload. That being said, VMware has conducted numerous tests with its VMware vSphere® platform and has determined  that performance can actually be 12 percent faster than with bare-metal environments.
  • Has Hadoop been battle tested in production environments? Yes. For example, Adobe’s Digital Marketing business unit has been using Hadoop internally for a set of different applications for several years and has achieved significant gains in analytics on its key customer data. Skyscape, a UK company that provides cloud computing services through a UK Government program, also deploys and manages virtualized Hadoop clusters. These clusters deliver infrastructure as a service (IaaS), platform as a service (PaaS), and software as a service (SaaS) to a large community of end users.
  • Do major Hadoop vendors support virtualization? Yes. In fact, VMware has working and cross-certification relationships with Cloudera, Hortonworks, and MapR, all of which are poised to support Big Data virtualization.

The ability to run Big Data initiatives in a virtualized environment with no special hardware has important implications. A virtualized Big Data environment enables faster time to results with a simpler management environment that is more cost effective to the bottom line.