Home > Blogs > VMware vFabric Blog


Breaking the Mindset: Why Hadoop Can and Should Move Past Bare-Metal Deployments to Virtualization

Whenever we’ve dealt with something for a while, our way of thinking about it becomes a habit. Hadoop deals with a lot of data. Currently, the record is 100 petabytes in a Facebook cluster that analyzes log data.  Since it was built by the likes of Google and Facebook to deal with such large data volumes and performance, it originally was built to run on bare-metal servers. Since it wasn’t an option from the get-go, the notion that you can’t have that much data running on a move-able virtual machine safely has largely gone unchallenged.

However, as time has gone on, and technology has allowed for persistent storage on the cloud, organizations have started to rethink this paradigm. In fact, several companies are using Hadoop and big data today to gain competitive advantage. And while they are running it on virtualization, they are not moving the data. There are other advantages.

VMware’s Big Data product line marketing manager Joe Russell, spoke with Roberto Zicari this week in an interview on ODBMS.org that helps articulate why Hadoop not only can run on virtual infrastructure using Project Serengeti, but why companies should consider it to save time and make Hadoop more usable.

Understanding Data Locality with Hadoop Running on Serengeti

With Hadoop and virtualization, it’s important to see virtualization through the lens of data locality.

In a distributed system, data locality is about keeping the processing and data together to achieve greater performance and avoid network bottlenecks. Moving large volumes of data with vMotions is unreasonable. With petabytes of data, it would take too long and affect data availabillty. Similarly, separating the processing introduces latency that is unacceptable.

Try Serengeti Now

Click here

As Russell explains in the article, Serengeti is providing value by preserving data locality, but allowing organizations to deploy a enterprise-tested High Availability and Fault Tolerant Hadoop clusters in minutes—something even the most seasoned Hadoop veteran can not do. It also paves the way for advanced use cases such as mixed payload deployments and multi-tenancy.

Russell explains further:

A common misconception when virtualizing Hadoop clusters is that we decouple the data nodes from the physical infrastructure. This is not necessarily true. When users virtualize a Hadoop cluster using Project Serengeti, they separate data from compute while preserving data locality. By preserving data locality, we ensure that performance isn’t negatively impacted, or essentially making the infrastructure appear as static. Additionally, it creates true multi-tenancy within more layers of the Hadoop stack, not just the name node. 

I think there is some confusion when we say “in the cloud”. Here, Steve is talking about running it on a public cloud like Amazon. Steve is largely introducing the concept of data locality, or the notion that large amounts of data are hard to move. In this scenario, it makes sense to bring compute resources to the data to ensure performance isn’t negatively impacted by networking limitations. VMware advocates that Hadoop should be virtualized, as it introduces a level of flexibility and management that allows companies to easily deploy, manage, and scale internal Hadoop clusters.

 

Zicari probes further and challenges while you can keep the data together, aren’t there basic functions that depend on the data and processing happening together:

Zicari: There are concerns on the approach of decoupling Apache Hadoop nodes from the underlying physical infrastructure. Quoting Steve Loughran (HP Research): “Hadoop contains lots of assumptions about running in a static infrastructure; it’s scheduling and recovery algorithms assume this.” What is your take on this?

Joe Russell: A common misconception when virtualizing Hadoop clusters is that we decouple the data nodes from the physical infrastructure. This is not necessarily true. When users virtualize a Hadoop cluster using Project Serengeti, they separate data from compute while preserving data locality. By preserving data locality, we ensure that performance isn’t negatively impacted, or essentially making the infrastructure appear as static. Additionally, it creates true multi-tenancy within more layers of the Hadoop stack, not just the name node.

I think there is some confusion when we say “in the cloud”. Here, Steve is talking about running it on a public cloud like Amazon. Steve is largely introducing the concept of data locality, or the notion that large amounts of data are hard to move. In this scenario, it makes sense to bring compute resources to the data to ensure performance isn’t negatively impacted by networking limitations. VMware advocates that Hadoop should be virtualized, as it introduces a level of flexibility and management that allows companies to easily deploy, manage, and scale internal Hadoop clusters.

How Does It Work?

VMware created Hadoop Virtual Extensions (“HVE”) to make Hadoop distributions virtualization aware. It works by inserting a node group layer between the rack and host to make Hadoop distributions topology aware for virtualized platforms. So, while technically Serengeti has separated the compute resources from the data to allow for better management, scaling and faster deployments, the hypervisor knows to keep the processing and data on the same physical machine.

Russell also outlines how High Availability is added through using vSphere:

We ensure High Availability (HA) by leveraging vSphere’s tested solution via Project Serengeti’s integration with vCenter (management console of vSphere).

In the event of physical server failure, affected virtual machines are automatically restarted on other production servers with spare capacity. In the case of operating system failure, vSphere HA restarts the affected virtual machine on the same physical server.

In Hadoop nomenclature, this means that there is HA on more than just the name node. vSphere’s solution also allows for HA on the jobtracker node, metastores, and on the management server, which are critical pieces of any Hadoop system that require high availability.

More importantly, as Hadoop is a batch-oriented process, it is important that when a physical host does fail, that you are able to pause and then restart that job from the point in time in which it went down. VMware’s vSphere solution allows for this and has been tested amongst the biggest Enterprises for the better part of the past decade.

 

HVE has been donated back to Apache Hadoop. Similarly, Serengeti is also open source. While it doesn’t make sense for VMware to spend money and engineering to have Serengeti ported to work with other hypervisors, Russell does state that this is very much the desire.

This entry was posted in Serengeti and tagged , , on by .
Stacey Schneider

About Stacey Schneider

Stacey Schneider has over 15 years of working with technology, with a focus on working with sales and marketing automation as well as internationalization. Schneider has held roles in services, engineering, products and was the former head of marketing and community for Hyperic before it was acquired by SpringSource and VMware. She is now working as a product marketing manager across the vFabric products at VMware, including supporting Hyperic. Prior to Hyperic, Schneider held various positions at CRM software pioneer Siebel Systems, including Group Director of Technology Product Marketing, a role for which her contributions awarded her a patent. Schneider received her BS in Economics with a focus in International Business from the Pennsylvania State University.

108 thoughts on “Breaking the Mindset: Why Hadoop Can and Should Move Past Bare-Metal Deployments to Virtualization

  1. فروشگاه عروسک

    its nice and useful article thank you

    Reply
  2. طراحی سایت

    hi

    best for web

    love me vmware

    http://www.asiait.ir

    Reply
  3. ارشد عمران

    thanks for this useful post 🙂

    Reply
  4. طراحی سایت ارزان

    thanks vmware !
    🙂

    Reply
  5. سرور مجازی ایران

    I am truly happy to read this webpage posts which includes plenty of helpful facts, thanks for providing such statistics.

    Reply
  6. سایت دانلود نرم افزار

    sina

    Reply
  7. خرید وی پی ان

    i happy

    Reply
  8. اخبار اقتصادی

    rr

    Reply
  9. آموزش برنامه نویسی

    learn

    Reply
  10. خرید vpn

    nice

    Reply
  11. saman

    very nice. thank you for your content

    Reply
  12. blackmounta.in

    yu65gdfgdfgv

    Reply
  13. cupidtino.in

    ererrxfzdcvvhniuioklmm

    Reply
  14. کارت تخفیف

    happy to tell you that your article

    Reply
  15. iranaustralia

    very nice. az shoma mamnoonam

    Reply
  16. سئو سایت

    az shoma motashakeram , ok

    Reply
  17. افزایش ممبر تلگرام

    very good

    Reply
  18. افزایش ممبر

    good

    Reply
  19. خرید پیج اینستاگرام

    perect

    Reply
  20. افزایش ممبر کانال

    booking

    Reply
  21. خرید ممبر

    yahoo

    Reply
  22. ممبر تلگرام

    microsoft

    Reply
  23. Mohammad Ahmadi Akbari

    thanks for this useful post

    Reply
  24. buy telegram members

    very good tnx

    Reply
  25. active directory

    Very useful article congratulations

    Reply
  26. ninite

    I liked the very good article

    Reply
  27. alisson paulo

    Thank you very much for this incredible article. 🙂

    Reply
  28. Resina sublimatica

    I really liked the article. I am a frequent user of vmware

    Reply
  29. Como Vencer a azia

    I really liked the article mindsite

    Reply
  30. Curso de Corel Draw x8

    Le poste est vieux aimait beaucoup.

    Reply
  31. Curso como fazer pão de mel

    Congratulations on the article

    Reply
  32. خرید کاندوم

    خرید کاندوم

    Reply
  33. تابلو ال ای دی

    تابلو ال ای دی

    Reply
  34. اتوبار تهران

    باربری و خدمات حمل و نقل تهران
    باربری ولنجک

    کلیه سرویس ها شبانه روزی و 24 ساعته هستند.

    Reply
  35. کفسابی عبدی

    کفسابی

    Reply
  36. سمعک آویتا

    سمعک آویتا

    Reply
  37. تور کربلا

    تور کربلا

    Reply
  38. تولید کیسه فریزر

    نارگل نگین

    Reply
  39. نظافت منزل

    شرکت نظافتی خدماتی

    Reply
  40. ثبت شرکت در انگلستان

    ثبت شرکت در انگلستان

    Reply
  41. ثبت آگهی رایگان

    ثبت آگهی رایگان

    Reply
  42. اخبار پلاستیک و پلیمر

    اخبار پلاستیک و پلیمر

    Reply
  43. مواد غذایی

    صنایع کشاورزی و مواد غذایی

    Reply
  44. رنگ و رزین

    رنگ و رزین

    Reply
  45. اخبار ساختمان

    اخبار ساختمان

    Reply
  46. آرایشی و بهداشتی

    آرایشی و بهداشتی

    Reply
  47. قطعه سازان خودرو

    قطعه سازان خودرو

    Reply
  48. سئو

    صفحه اول گوگل

    Reply
  49. مدل لباس

    the best post thank you

    Reply
  50. دانلود بازی اندروید

    very good article.
    thank you

    Reply
  51. دانلود آهنگ

    musics

    Reply
  52. موسیقی متن فیلم

    موسیقی متن فیلم

    Reply
  53. مهاجرت به استرالیا

    iranaustralia

    Reply
  54. ترجمه ناتی

    sublime

    Reply
  55. تانک ازت

    thank you

    Reply
  56. تحصیل در سوئد

    high-grade

    Reply
  57. نماشویی

    magnificent

    Reply
  58. ریش تراش برقی

    high-toned

    Reply
  59. تحصیل در اتریش

    it is so useful

    Reply
  60. آموزش کاشت ناخن

    Appreciate that

    Reply
  61. چاپ بنر

    wonderful

    Reply
  62. تعمیر دستگاه کپی

    tnx a lot

    Reply
  63. اجاره انبار در تهران

    copacetic

    Reply
  64. دیزل ژنراتور

    supreme

    Reply
  65. تین کلاینت

    it ` s superlative

    Reply
  66. ساختمان پیش ساخته

    high-grade

    Reply
  67. جوان سازی

    fine

    Reply
  68. کفسابی

    copacetic

    Reply
  69. هدایای تبلیغاتی

    superlative

    Reply
  70. درب بطری

    nobby

    Reply
  71. بهینه سازی سایت

    high-toned & top-hole

    Reply
  72. آیلتس

    gallant

    Reply
  73. اجاره انبار در تهران

    Like it

    Reply
  74. دستگاه تاتو

    exalted

    Reply
  75. ویزای شینگن

    remarkable

    Reply
  76. انجام پایان نامه

    elevated

    Reply
  77. انجام پایان نامه

    grand

    Reply
  78. ویرایش مقاله انگلیسی

    very well

    Reply
  79. خرید اسپری تاخیری

    its ok

    Reply
  80. آهنگ ایرانی

    خدمات تلويزيون شهري

    Reply
  81. تلویزیون شهری

    قیمت تلویزیون شهری

    Reply
  82. ممبر تلگرام

    tnx

    Reply
  83. ممبر تلگرام

    very good good

    Reply
  84. افزایش فالووراینستاگرام

    cfghfhf

    Reply
  85. طراحی سایت

    nice thanks

    Reply
  86. خرید پیج اینستاگرام

    very very very

    Reply
  87. افزایش ممبر تلگرام

    vry very aa

    Reply
  88. افزایش فالوور اینستاگرام

    tnx tnc

    Reply
  89. followers instagram

    very good

    Reply
  90. Personal Trainer in Manchester,

    This is ben the best personal trainer in Manchester

    Reply
  91. سفارش آنلاین غذا

    very very good

    Reply
  92. سفارش اینترنتی غذا

    very good morinig tnx

    Reply
  93. کاشت-مو

    This is ben the best personal trainer in Manchester
    کاشت-مو کاشت-مو کاشت-مو

    Reply
  94. لاغری

    لاغری لاغری لاغری لاغری لاغری لاغری لاغری لاغری
    کیف ابزار

    Reply
  95. فروش کود

    I was searching

    Reply
  96. خرید پیج اینستاگرام

    good

    Reply
  97. خرید ممبر تلگرام

    good

    Reply
  98. خرید فالوور اینستاگرام

    your content was really nice

    Reply
  99. خرید ممبر تلگرام

    awesome buddy

    Reply
  100. جهیزیه عروس

    یکی از مراکز معتبر خرید جهیزیه عروس در ایران و خرید لوازم آشپزخانه فروشگاه اینترنتی نوین جهاز هست که میتونید برای خرید جهیزیه ها به اونجا سر بزنید.
    ممنون از مطالب خوبتون

    Reply
  101. کفسابی

    nice and useful

    Reply
  102. اجاره انبار در تهران

    its very useful and good text

    Reply
  103. پانسمان زخم

    thousands of best articles

    Reply
  104. دیجیتال مارکتینگ

    tnx for sharing this

    Reply
  105. استعلام و پرداخت قبض

    nic.. i will translate this topic.https://ghabzino.com

    Reply
  106. Ghabzino

    http://pardakhteghabz.website2.me/

    Reply
  107. طراحی سایت در اردبیل

    طراحی سایت در اردبیل
    طراحی سایت در اردبیل
    سرویس طراحی سایت در اردبیل یکی از خدمات مجموعه سپنا وب می باشد. مجموعه سپنا وب به عنوان اولین مرکز تخصصی مشاوره و ارائه خدمات توسعه کسب و کار و بازاریابی اینترنتی در اردبیل با استفاده از تجربه چندین ساله متخصصین خود خدمات ویژه ای در استان اردبیل ارائه کرده است.

    translate this.
    how can i use this in cloud servers?

    Reply
  108. ثبت نام دانشگاه

    It is truly a nice and helpful piece of info. I am
    satisfied that you simply shared this helpful info with us.
    Please stay us up to date like this. Thank you for
    sharing.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

*