Home > Blogs > VMware vFabric Blog


7 Myths on Big Data—Avoiding Bad Hadoop and Cloud Analytics Decisions

Hadoop is an open source legend built by software heroes.

Yet, legends can sometimes be surrounded by myths—these myths can lead IT executives down a path with rose-colored glasses.

Data and data usage is growing at an alarming rate.  Just look at all the numbers from analysts—IDC predicts a 53.4% growth rate for storage this year, AT&T claims 20,000% growth of their wireless data traffic over the past 5 years, and if you take at your own communications channels, its guaranteed that the internet content, emails, app notifications, social messages, and automated reports you get every day has dramatically increased.  This is why companies ranging from McKinsey to Facebook to Walmart are doing something about big data.

Just like we saw in the dot-com boom of the 90s and the web 2.0 boom of the 2000s, the big data trend will also lead companies to make some really bad assumptions and decisions.

Hadoop is certainly one major area of investment for companies to use to solve big data needs. Companies like Facebook that have famously dealt well with large data volumes have publicly touted their successes with Hadoop, so its natural that companies approaching big data first look to the successes of others.  A really smart MIT computer science grad once told me, “when all you have is a hammer, everything looks like a nail.” This functional fixedness is the cognitive bias to avoid with the hype surrounding Hadoop. Hadoop is a multi-dimensional solution that can be deployed and used in different way. Let’s look at some of the most common pre-concieved notions about Hadoop and big data that companies should know before committing to a Hadoop project:

1. Big Data is purely about volume—NOT TRUE

Besides volume, several industry leaders have also touted variety, variability, velocity, and value. Putting all arguments about alliteration aside, the point is that data is not just growing—it is moving further towards real-time analysis, coming from structured and unstructured sources, and being used to try and make better decisions. With these considerations, analyzing a large volume of data is not the only way to achieve value. For example, storing and analyzing terabytes of data over time might not add nearly as much value as analyzing 1 gigabyte of really important, impactful information in real time. From a tool-set perspective, you might want an in-memory data grid built for real-time pricing calculations instead of a way to slice and dice historical prices into a dead horse.

2. Traditional SQL doesn’t work with Hadoop—NOT TRUE

When Facebook, Twitter, Yahoo! and others bet big on Hadoop, they also knew that HDFS and MapReduce were limited in their ability to deal with expressive queries through a language like SQL. This is how Hive, Pig, and Sqoop were ultimately hatched. Given that so much data on earth is managed through SQL, many companies and projects are offering ways to address the compatibility of Hadoop and SQL. Pivotal HD’s HAWQ is one example—a parallel SQL-compliant query engine that has shown to be 10 to 100s of times faster than other Hadoop query engines in the market today—and it was built to support petabyte data sets.

3. Kill the Mainframe! Hadoop is the only the new IT data platform—NOT TRUE

There are many longstanding investments in the IT portfolio, and the mainframe is an example of one that probably should evolve along with ERP, CRM, and SCM. While the mainframe isn’t being buried by companies, it definitely needs a new strategy to grow new legs and expand on the value of it’s existing investment. For many of our customers that run into issues with mainframe speed, scale, or cost, there are incremental ways to evolve the big iron data platform and actually get more use out of it. For example, in-memory, big data grids like vFabric SQLFire can be embedded or use distributed caching approaches for dealing with problems like high-speed ingest from queues, speeding mainframe batch processes, or real-time analytical reporting.

4. Virtualized Hadoop takes a performance hit—NOT TRUE

Hadoop was designed originally to run on bare metal servers, however as adoption has grown many companies want it as a data center service running in the cloud. Why do companies want to virtualize Hadoop? First, let’s consider the ability to manage infrastructure elastically—we quickly realize that scaling compute resources, like virtual Hadoop nodes, help with performance when data and compute are separated—otherwise, you would take a Hadoop node down and lose the data with it or add a node and have no data with it. Major Hadoop distributions from MapR, Hortonworks, Cloudera, and Greenplum all support Project Serengeti and Hadoop Virtualization Extensions (HVE) for this reason. In addition, our research with partners has show that Hadoop works quite well on vSphere and can even perform better under certain conditions—running 2 or 4 smaller VMs per physical machine often resulted in better performance, up to 14% faster, than a native approach according to benchmarks we’ve done with partners.

5. Hadoop only works in your data center—NOT TRUE

First of all, there are SaaS-based, cloud solutions, like Cetas, that allow you to run Hadoop, SQL, and real-time analytics in the cloud without investing the time and money it takes do build a large project inside your data center. For a public cloud runtime, Java developers can probably benefit from Spring Data for Apache Hadoop and the related examples on GitHub or online video introduction.

6. Hadoop doesn’t make financial sense to virtualize—NOT TRUE

Hadoop is typically explained as running on a bank of commodity servers—so, one might conclude that adding a virtualization layer adds extra cost but no extra value. There is a flaw in this perspective—you are not considering the fact that data and data analysis are both dynamic. To become an organization that leverages the power of Hadoop to grow, innovate, and create efficiencies, you are going to vary the sources of data, the speed of analysis, and more. Virtualized infrastructure still reduces the physical hardware footprint to bring CAPEX in line with pure commodity hardware, and OPEX is reduced through automation and higher utilization of shared infrastructure.

7. Hadoop doesn’t work on SAN or NAS—NOT TRUE

Hadoop runs on local disks, but it can also run well in a shared SAN environment for small to medium sized clusters with different cost and performance characteristics. High bandwidth networks like 10GB Ethernet, FoE, and iSCSI can also support effective performance.

Taking Action to Overcome the Myths

While many of us are fans of big data, this list can help you take a step back and look objectively at the right approach to solving your big data problems. Just like some building projects need hammers and others need screwdrivers, hacksaws, or a welding torch, Hadoop is just one tool to help conquer big data problems. High velocity data may push you towards an in-memory, big data grid like GemFire or SQLFire. A need for massive, consumer-grade web scale may mean you need message-oriented middleware like RabbitMQ. Getting to market faster may mean you need to look at a full SaaS solution like Cetas, and Redis may meet your needs and find a home in your stack much easier than a full blown Hadoop environment.

To learn more about the products in this article:

 

This entry was posted in GemFire, RabbitMQ, Serengeti, SQLFire and tagged , , , , , , , on by .
Adam Bloom

About Adam Bloom

Adam Bloom has worked for 15+ years in the tech industry and has been a key contributor to the VMware vFabric Blog for the past year. He first started working on cloud-based apps in 1998 when he led the development and launch of WebMD 1.0’s B2C and B2B apps. He then spent several years in product marketing for a J2EE-based PaaS/SaaS start-up. Afterwards, he worked for Siebel as a consultant on large CRM engagements, then launched their online community and ran marketing operations. At Oracle, he led the worldwide implementation of Siebel CRM before spending some time at a Youtube competitor in Silicon Valley and working as a product marketer for Unica's SaaS-based marketing automation suite. He graduated from Georgia Tech with high honors and an undergraduate thesis in human computer interaction.

105 thoughts on “7 Myths on Big Data—Avoiding Bad Hadoop and Cloud Analytics Decisions

  1. Profit From Home Academy Scam

    Howdy! This post could not be written much better! Looking through this
    post reminds me of my previous roommate! He continually kept talking about this.
    I’ll forward this post to him. Fairly certain he’s going to have a great read.
    Many thanks for sharing!

    Reply
  2. skin care products

    A fascinating discussion is worth comment. I do think that you should write more about this subject, it might not be a taboo matter but usually
    people don’t discuss these issues. To the next! Cheers!!

    Reply
    1. طراحی سایت در مشهد

      Cheers!

      Reply
  3. http://www.knowledgemobilization.net/

    For a nice face wash use one teaspoon whipped coconut oil with one-half teaspoon tea tree oil.
    Low quality herbs that may contain pesticides,
    herbicides, toxic heavy metals and GMOs. It has
    no side effects and is very good for detox process.

    Reply
  4. Daily Power cleanse

    Quality content is the key to be a focus for the viewers to pay
    a quick visit the site, that’s what this website is providing.

    Reply
    1. Smadav Download

      Nice post. I used to be checking continuously this weblog and I am
      inspired! Very helpful information specially the final section I handle such info much.
      I was seeking this particular info for a long time.

      Reply
  5. Pure Igf Reviews

    Nice post. I used to be checking continuously this weblog and I am
    inspired! Very helpful information specially the final section 🙂 I handle such info much.
    I was seeking this particular info for a long time.
    Thanks and best of luck.

    Reply
  6. http://calcionet.it

    I’m gone to inform my little brother, that he
    should also visit this webpage on regular basis to take updated from hottest information.

    Reply
  7. Arpita Bharadwaj

    It’s great to know more about Hadoop. On should have an idea about this technology. It’s really worth full to go for hadoop online course.

    For more information:
    http://www.jlcindia.com/Hadoop-Online-Course.html

    Reply
  8. Www.Unlimited-Free-Stuff.Com

    This paragraph is truly a nice one it helps new web
    visitors, who are wishing for blogging.

    Reply
  9. chemise burberry femme

    The KAP president is calling for greater federal oversight of grain movement.

    Reply
  10. sprzedam tabletki poronne

    Everything is very open with a precise clarification of the challenges. It was definitely informative. Your site is useful. Thanks for sharing!

    Reply
  11. e Cigs Uk

    I think the admin of this web page is genuinely
    working hard in support of his web site, for the reason that here every stuff is quality based data.

    My homepage: e Cigs Uk

    Reply
  12. Pingback: Customer Centricity, CMO and Big Data | Aditya Kamalapurkar

  13. Dezyre

    Excellent post, Adam! I love reading about some of these myths, as they are all so true. Number 6 seems to be the most constant, from my experience.

    Great post.

    Reply
  14. لوازم جانبی موبایل

    tnx a lot. your website is very informative

    Reply
  15. خرید فیلتر شکن کریو

    oh tnx man
    i love your site

    Reply
  16. طراحی سایت

    Nice post

    tnx a lot. your website is very informative

    Reply
  17. کتابیم

    tanx . is very nice

    Reply
  18. تبلیغات در تلگرام

    tanx . nice . verynice

    Reply
  19. مسافرت دلنشین برای شما و فرزندتان

    I truly value your piece of work, Great post.

    http://mohajeran-visa.joomla.com/blog.html
    مسافرت دلنشین برای شما و فرزندتان

    Reply
  20. خرید vpn

    very goooood tnx dear admin 😉

    Reply
  21. سرور مجازی ایران

    Excellent post

    Reply
  22. اخبار هنری

    ok

    Reply
  23. گرافیک و کاریکاتور

    33333333333333

    Reply
  24. پوشاک زنانه

    best site and post

    Reply
  25. خرید vpn

    this ins

    Reply
  26. mason

    Nice Article , thanks
    http://pinarvpn.net

    Reply
  27. خرید فیلتر شکن

    its good !

    Reply
  28. epsfi.in

    ‘/;’kl;jkl;jklj

    Reply
  29. evolvecorp.in

    mnvnvcmv hghgsdsdsduisuid

    Reply
  30. vpn

    thanks for helpfull article’s .

    Reply
  31. فال ورق

    niceeeeeeeee.thanks very much

    Reply
  32. خرید سی سی کم

    its good

    Reply
  33. adam_jhon

    Very Nice.

    Reply
  34. سئو در کرج

    شرکت طراحی وبسایت و سئو در کرج

    Reply
  35. خرید اپل ایدی ارزان

    خرید اپل ایدی
    خرید اپل ایدی

    Reply
  36. عکس بازیگران

    سایت تفریحی و سرگرمی , عکس بازیگران , اس ام اس

    Reply
  37. فارس کیدذ

    thanks alote
    i use your info

    Reply
  38. radiojavan

    رادیو جوان – دانلود آهنگ جدید – MyRadi0Javan.Com – Remixjavan.Com – Mp3Javan.Com

    Reply
  39. انلاک ایفون

    thanks for sharing
    انلاک فکتوری ایفون

    Reply
  40. hdaneshjoo

    Thanks to the nice site

    Reply
  41. انجام پایان نامه

    tanks for best site

    Reply
  42. روغن صنعتی

    tanks for best article

    Reply
  43. روغن موتور

    tanks for best article

    Reply
  44. فروش روغن موتور

    tanks for best article

    Reply
  45. خرید بک لینک

    hi ..
    very nice …

    Reply
  46. عسل طبیعی

    thank you !

    Reply
  47. قاب گوشی

    thanks a lot …

    Reply
  48. Vmware NSX Training

    Great post, Adam! Thanks for sharing very useful myths about running hadoop, as they are all so true. Hope it helps to the community here: http://mindmajix.com/hadoop-training

    Reply
  49. آموزش برنامه نویسی اندروید

    Hello, very nice site you good luck

    Reply
  50. فروش نرم افزار

    Thanks man

    Reply
  51. انجام پایان نامه

    this site is very good and useful content

    Reply
  52. سرور

    Thanks man

    Reply
  53. hosting

    this site is very good and useful content

    Reply
  54. ثبت شرکت در گرجستان

    tanks

    for post @ blog

    Reply
  55. تابلو ال ای دی

    تابلو ال ای دی

    Reply
  56. Download New Song

    Download New Song

    Reply
  57. دانلود فیلم گشت ارشاد دو

    running-hadoop-in-

    Reply
  58. دانلود فیلم گشت ارشاد دو

    running-hadoop-in-a-virtualized

    Reply
  59. پشتیبانی سایت

    very good

    Reply
  60. طراحی سایت

    Great Thanks

    Reply
  61. Ruth Evans

    Thanks for Providing Nice Information about VmWare.. Provided Valuable Stuff about VMware.

    Reply
  62. paola

    I realy like this information… so much.. thanks for sharing

    Reply
  63. Roberta

    Very good your article, really is of great relevance, I will follow your blog. Thank you for sharing.

    Reply
  64. back

    https://moallemblog.com
    اقدام پژوهي-درس پژوهي-گزارش تخصصي-طرح کرامت-طرح جابر-برنامه سالانه-تقويم اجرايي-طرح تعالي-طرح تدبير

    Reply
  65. Yuvan Asav

    Great article.I would like to add some more points about bigdata hadoop here.
    1 – Analyzing data is expensive
    2-Machine algorithms will replace human analysts
    3-Modern Data’s Potential
    4-Hadoop will replace enterprise data warehouses etc.

    Hope it will help.

    Reply
  66. آریاسان

    fantastic :))))

    Reply
  67. آریاسان

    fantastic :))))

    Reply
  68. sap hana professional

    thank you so much for the post, awesome

    Reply
  69. خرید بهترین vpn

    Thanks For Your Post
    its helpfull.
    http://online-prednisonebuy.org

    Reply
  70. تابلو آتلیه عکاسی

    I realy like this information… so much.. thanks for sharing

    Reply
  71. بلیط هواپیما لحظه آخری

    by the way, your post was quite interesting, thank you!

    Reply
  72. Bargavi Kalla

    Thanks for Providing Nice Information on VMware vFabric Blog

    Reply
  73. روغن خراطین

    روغن خراطین اصل و بهترین روغن خراطین

    Reply
  74. سئو

    I do agree with all of the ideas you have offered on your post. They’re very convincing and will certainly work. Still, the posts are very quick for novices. May just you please extend them a bit from next time? Thank you for the post.

    Reply
  75. نمایندگی تعمیرات ال جی

    t’s really amazing that we can record what our visitors do on our site. Thanks for sharing this awesome guide. I’m happy that I came across with your site this article is on point,thanks again and have a great day. Keep update more information

    Reply
  76. خرید بلیط هواپیما

    Great articles and great layout. Your blog post deserves all of the positive feedback it’s been getting.

    Reply
  77. نمایندگی تعمیرات سامسونگ

    nice sir

    Reply
  78. تلویزیون شهری

    Amazing article written by the author/writer.A great deal of research has been done on the topic.Read more on my site and if you are looking for a news site,checkout this

    Reply
  79. سریال دل

    Very good your article, really is of great relevance, I will follow your blog. Thank you for sharing.

    Reply
  80. جهیزیه عروس

    یکی از مراکز معتبر خرید جهیزیه عروس در ایران و خرید لوازم آشپزخانه فروشگاه اینترنتی نوین جهاز هست که میتونید برای خرید جهیزیه ها به اونجا سر بزنید.
    واااااااای
    مطالب ما عالیه اگر ندیدید حتما سر بزنتید بهمون

    Reply
  81. جهیزیه عروس

    یکی از مراکز معتبر خرید جهیزیه عروس در ایران و خرید لوازم آشپزخانه فروشگاه اینترنتی نوین جهاز هست که میتونید برای خرید جهیزیه ها به اونجا سر بزنید.
    بهترین ها را از بهترین ها بخوایید
    وقتی که کسی نمیفروشه از ما جنس بخرید ان هم با قیمت قدیم

    Reply
  82. purnima

    It helps me a lot to learn about and clarify my doubts such a great help.Thanks for sharing the information about MYTHSn Big data.Here iam sharing the information that will help to progressive Data Management & Email Marketing Support For your Business.B2B Mailing List

    Reply
  83. صور للقران

    thanks a lot

    Reply
  84. ahang

    thank you for article

    Reply
  85. songsnew

    thanks very good

    Reply
  86. download

    the best post

    Reply
  87. عسل طبیعی

    Hi every body

    Reply
  88. ABTIS.ir

    برترین سایت اخبار تکنولوژی ایرانی – http://abtis.ir

    Reply
  89. Bala Guntipalli

    Excellent article, Cool, Looking ahead to reading a lot. Sensible article Bala Guntipalli Thanks for posting.

    Reply
  90. Bala Guntipalli

    The information you provided in the article is useful and beneficial Bala Guntipalli Thanks for posting.

    Reply
  91. درمان فیستول با لیزر

    می توان گفت بهترین راه درمان فیستول با لیزر می باشد. با بررسی مزایای لیزر نسبت به جراحی باز می توان به این مسئله پی برد. در عمل با لیزر از بی حسی موضعی استفاده می شود بنابراین احتیاجی به بیهوشی و بستری شدن بیمار در بیمارستان نمی باشد. خونریزی ناشی از عمل با لیزر در مقایسه با جراحی بسیار کمتر است. برخلاف جراحی که دوران نقاهت یک تا دو ماهه دارد، با این روش فرد به راحتی می تواند فعالیت های روزانه خود را از سر بگیرد. به وسیله عمل با لیزر فرد دچار درد شدید نمی شود و به نسبت عمل با جراحی درد کمتری احساس می کند.

    Reply
  92. چاپ دیجیتال

    that was quiet useful …
    thanx

    Reply
  93. تور لحظه آخری

    We always pray you and your colleagues that your good luck will also pray to us to succeed in this. Hope good days

    Reply
  94. بواسیر

    بواسیر به عروق موجود در قسمت انتهایی دستگاه گوارشی و مقعد که بالشتکی شکل هستند ، گفته می شود

    Reply
  95. alex

    thank you very nice

    Reply
  96. چاپ لایت باکس تبلیغاتی

    tnx dear
    very good website

    Reply
  97. سایت کلیکی

    thanks for the great information vmware

    Reply
  98. Lottery MAN

    Nice Informative Blog having nice sharing..

    Reply
  99. پوشک بزرگسالان

    Thank you for the information you shared

    Reply
  100. پوشک بزرگسال

    thanks for this page.

    Reply
  101. گروه تحقیقاتی و اموزشی کرامت

    Thanks
    گروه تحقیقاتی و اموزشی کرامت

    Reply
  102. گروه تحقیقاتی و اموزشی کرامت

    Thanks
    گروه تحقیقاتی و اموزشی کرامت

    Reply
  103. گروه تحقیقاتی و اموزشی کرامت

    Yes
    گروه تحقیقاتی و اموزشی کرامت

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

*