Home > Blogs > VMware vFabric Blog

10 Ways to Make Hadoop Green in the CFO’s Eyes

Hadoop is used by some pretty amazing companies to make use of big, fast data—particularly unstructured data. Huge brands on the web like AOL, eBay, Facebook, Google, Last.fm, LinkedIn, MercadoLibre, Ning, Quantcast, Spotify, Stumbleupon, Twitter, as well as some more brick and mortar giants like GE, Walmart, Morgan Stanley, Sears, and Ford use Hadoop.

Why? In a nutshell, companies like McKinsey believe the use of big data and technologies like Hadoop will allow companies to better compete and grow in the future.

Hadoop is used to support a variety of valuable business capabilities—analysis, search, machine learning, data aggregation, content generation, reporting, integration, and more. All types of industries use Hadoop—media and advertising, A/V processing, credit and fraud, security, geographic exploration, online travel, financial analysis, mobile phones, sensor networks, e-commerce, retail, energy discovery, video games, social media, and more.

At first glance, it sounds like many of the above business needs were already solved by conventional data warehouses, business intelligence, and statistical analysis programs. This is not the case—the conventional systems begin to fail when the data sets become too large, include fast-growing unstructured data formats, or face both of these issues. With size and complexity issues, traditional BI systems can become too expensive. This is why Hadoop was invented.

Simply put, Hadoop follows the MapReduce model to slice data into chunks of work, spread the work across a large number of commodity servers, and aggregate the work back into a single output. It’s parallel computing approach out-scales the old models and is more cost-effective at doing so.

Effectively Managing Hadoop from the CFO’s Eyes

In the early days of the web and enterprise apps, everyone got so enamored by the potential for growth and productivity that both business and IT teams spent money prematurely—we ended up with a massive number of underutilized servers that cost us an arm and a leg to operate. Then, we spent more to virtualize these resources, get better utilization out of our datacenters and reduce our overhead.

With the big data technology trend, we are facing the same excitement around Hadoop. It’s going to be an investment area for the next decade or two, and your CFO is going to see this coming. This time around, we can spend IT dollars much more wisely buy putting Hadoop on virtualized infrastructure from the beginning. For those of us that have learned the painful TCO lessons from the past and understand the economics of virtualization, here is a list of ten key, financially sound, cloud infrastructure requirements that should be part of any Hadoop project:

  1. Initial Hadoop projects should be explored for the most pressing issues in the company and start by aligning with the CEO and CFO’s top needs and goals.
  2. Hadoop investments should run with the same data center efficiency and cost effectiveness as other virtualized platforms that have high server consolidation ratios and require less CapEx and OpEx than non-virtualized environments.
  3. Hadoop pilots should identify a big problem, make the scope concise, and complete quickly to prove the time-to-value and identify future costs and risks thoroughly. We all learn by doing—don’t drag out the time to value by over-engineering.
  4. Hadoop must be able to co-locate with existing applications and run on existing virtualized hosts. This approach should accommodate a Hadoop pilot without new hardware or help manage shared infrastructure budgets in a cost-effective manner.
  5. Hadoop nodes should use the concept of time sharing. For example, when email, database, web, or ERP applications are idle, the compute power available should be transferred to Hadoop nodes that are analyzing improvements in business performance.
  6. The Hadoop infrastructure should be able to scale up or down elastically, on-demand, and across clouds for burst compute needs. This capability would allow you to expedite a big analysis on your company’s performance by temporarily adding new Hadoop nodes on a 3rd party cloud service to increase capacity.
  7. Hadoop VMs should not require significant resources to scale, provision, deploy, replicate, or move because a cloud-centric, virtual machine infrastructure can accommodate this.
  8. Hadoop should be available to the company as a shared service. This is one of the most cost-effective ways to provide Hadoop as a service. In this model, it is available to all departments based on chargeback accounting. Even with shared services, virtualization still allows for enough isolation to meet independent business and security needs.
  9. Hadoop should not require expensive, high availability or fault tolerance (i.e. no downtime) frameworks based on hardware. Distributed computing is meant for commodity computing in the cloud.
  10. Hadoop training, at least at a high level, should be provided to every IT person who engages with various business units and departments—Hadoop attracts talent and paves careers.

To learn more about how VMware is helping virtualize Hadoop clusters, check out Project Serengeti.

This entry was posted in Serengeti and tagged , , , , , , , , on by .
Adam Bloom

About Adam Bloom

Adam Bloom has worked for 15+ years in the tech industry and has been a key contributor to the VMware vFabric Blog for the past year. He first started working on cloud-based apps in 1998 when he led the development and launch of WebMD 1.0’s B2C and B2B apps. He then spent several years in product marketing for a J2EE-based PaaS/SaaS start-up. Afterwards, he worked for Siebel as a consultant on large CRM engagements, then launched their online community and ran marketing operations. At Oracle, he led the worldwide implementation of Siebel CRM before spending some time at a Youtube competitor in Silicon Valley and working as a product marketer for Unica's SaaS-based marketing automation suite. He graduated from Georgia Tech with high honors and an undergraduate thesis in human computer interaction.

15 thoughts on “10 Ways to Make Hadoop Green in the CFO’s Eyes

  1. Mobile money manager

    It’s a wonderful in addition to helpful little bit of info. We’re thankful that you just discussed this useful facts along with us. You need to be united states informed such as this. Many thanks for spreading.


Leave a Reply

Your email address will not be published. Required fields are marked *