Home > Blogs > VMware vFabric Blog


Serengeti Helps Enterprise Respond to the Big Data Challenge

Enterprise Demands Analytic Platform

Big Data adoption in the enterprise has traditionally been hindered by the lack of usable enterprise-grade tools and the shortage of implementation skills.

Register for VMworld!
Click Here

Register for Session TEX2183 – Highly Available, Elastic and Multi-Tenant Hadoop on vSphere:
Click Here

Follow all vFabric updates at VMworld on Twitter:
Click Here

Enterprise IT is under immense pressure to deliver a Big Data analytic platform. The majority of this demand is currently for pilot Hadoop implementations, with fewer than 20 nodes, intended to prove its value to deliver new business insight. Gartner predicts that this demand will further increase by 800 percent over the next five years.

The explosive growth of these kinds of requests in mid-to-large size companies renders IT departments unable to that demand. Furthermore, Hadoop, and all of its ecosystem tools, are often too complex to deploy and manage for many of these organizations.

As a result, enterprise users, frustrated by these delays, often opt to circumvent IT, and, go directly to on-line analytic service providers. While satisfied by the immediacy of access, they often compromise many of the corporate data policies, inefficiently proliferate data and accrue large costs due to unpredictable pricing models.

The good news is that enterprise IT has recognized this issue, and, is in a process of retooling to address the shortage of Hadoop deployment and management skills.

Meet Serengeti, Enterprise Big Data Accelerator

At VMworld, we had the opportunity to demonstrate VMware’s solution to this problem by using the recently announced open source Serengeti project. Serengeti enables rapid deployment of standardized Apache Hadoop clusters on an existent virtual platform, using spare machine cycles, with no need to purchase additional hardware or software.

Our demo illustrated how Serengeti, with its standardized approach to deployment and management, can deliver enterprise-grade analytic platform with an unmatched “time to value.” (This is the time it takes from initiating Hadoop deployment until performing data analyses on the newly created fully functional cluster.)

The following video demonstrates how Serengeti can deploy standardized Hadoop cluster using a single command in under 10 min.

Declarative Deployment

Besides the obvious efficiency gains, Serengeti also enables a declarative approach to Hadoop deployment. This spec-file driven approach ensures repeatable, standardized deployment with unmatched granularity of control over cluster configuration and topology.

In addition to the infrastructure-level configuration, Serengeti also enables Hadoop attribute configuration, normally found in numerous Hadoop configuration files: core-site.xml, hdfs-site.xml, mapred-site.xml, hadoop-env.sh and log4j.properties:

…"configuration": {
    "hadoop": {
      "core-site.xml": {
        // check for all settings at http://hadoop.apache.org/common/docs/r1.0.0/core-default.html
      },
      "hdfs-site.xml": {
        // check for all settings at http://hadoop.apache.org/common/docs/r1.0.0/hdfs-default.html
      },
      "mapred-site.xml": {
        // check for all settings at http://hadoop.apache.org/common/docs/r1.0.0/mapred-default.html
        "io.sort.mb": "300"
      } ,
      "hadoop-env.sh": {
        // "HADOOP_HEAPSIZE": "",
        // "HADOOP_NAMENODE_OPTS": "",
        // "HADOOP_DATANODE_OPTS": "",
…

The above single specification file, including Hadoop-level configuration, can be executed from the Serengeti cluster using config command:

> cluster config -name demoCluster
                 -specFile /home/demo/smallDemoCluster.json

Not Only Hadoop

In addition to the efficiency gains during Hadoop deployment we demonstrated above, Serengeti also makes it easier to integrate Hadoop with the existent systems without the need for constant copying of data around through its ODBC/JDBC services as wella s Pig and Hive for exploring large data sets already in HDFS.

The following is in an example of the basic workflow, along with the sample commands, to stand up a Hadoop cluster, manage its size, import data, execute MapReduce job and expose its results to the data consumers through the integrated Hive server.

Deploy Hadoop cluster

> cluster create –name demoCluster

Manage existent Hadoop cluster

> cluster resize –name demoCluster
                 –nodeGroup worker
                 –instanceNum 10

Import/Download data

> fs ls /tmp
> fs put –from /tmp/local.data –to /tmp/hdfs.data

Execute MapReduce/Pig/Hive jobs

> cluster target –name demoCluster
> mr jar –jarfile /opt/big-calc-1.0.0.jar
         –mainclass com.company.data.calc.BigJob
         –args “arg1 arg2 arg3”

Configure Hive Server for ODBC/JDBC services

…
"name": "client",
"roles": [
   "hadoop_client",
   "hive",
   "hive_server",
   "pig"
],
…

Moving to Production

Besides the pilot implementation efficiency gains, it is worth mentioning that Serengeti also delivers a series of enterprise-grade enhancements that enterprise IT expects in its production environment. The two worth highlighting here are High Availability (HA) and Fault Tolerance (FT).

HA – Protection against host and VM failures

VMware, in collaboration with Hortonworks, has included in Serengeti protection against Name Node (NN) and Job Tracker (JT) failures. Serengeti automatically detects failure, and, can restart virtual machine in minutes, on any of the available hosts in Hadoop cluster. Hadoop jobs already in progress, will be paused and resumed by Serengeti when name node is up.

In contrast to HA available in HDFS 2, Serengeti HA covers all master services as well as Apache Hadoop version 1.

FT – Provides Continuous Protection

Taking the notion of protection even further, Serengeti, when correctly configured on vSphere, delivers true zero downtime Hadoop system by preventing data loss not only for Name Node and Job Tracker, but also other components in the Hadoop cluster.

Serengeti, using its tight integration with VMware’s HA/DRS services, can deliver continuous protection for Hadoop nodes without the need for complex clustering or specialized hardware while only impacting performance in nominal way. (2-4% slowdown for TeraSort)

In summary

Enterprise IT is currently under pressure to respond to the increasing demands for a reliable Big Data platform; enabling users to assess the growing data volumes for potential business insight.

By accelerating the Hadoop deployment process, and delivering the fastest time to business insight, Serengeti is in the command center, making this often trial-and-error based process more reliable and efficient. Serengeti greatly simplifies the user experience by allowing users to focus on the data and its algorithms. — not the underlying infrastructure.

Learn More

This entry was posted in Serengeti, Spring and tagged , , , , , on by .
Mark Chmarny

About Mark Chmarny

During his 15+ year career, Mark Chmarny has worked across various industries. Most recently, as a Cloud Architect at EMC, Mark developed numerous Cloud Computing solutions for both Service Provider and Enterprise customers. As a Data Solution Evangelist at VMware, Mark works in the Cloud Application Platform group where he is actively engaged in defining new approaches to distributed data management for Cloud-scale applications. Mark received a Mechanical Engineering degree from Technical University in Vienna, Austria and a BA in Communication Arts from Multnomah University in Portland, OR.

56 thoughts on “Serengeti Helps Enterprise Respond to the Big Data Challenge

  1. Pingback: VMware’s Project Serengeti Makes Big Data More Accessible and Efficient | VMware vFabric Blog - VMware Blogs

  2. انجام پایان نامه حقوق

    THANKS FOR THIS POST.BEST WISHES.

    Reply
  3. خرید vpn

    enterprise not bad

    Reply
  4. پکیج تصفیه فاضلاب بیمارستانی

    tnx.

    Reply
  5. طراحی سایت در کرمانشاه

    great post

    Reply
  6. درب ضد سرقت

    THANKS THIS POST.

    Reply
  7. شغل دوم

    thanks for your good article .

    Reply
  8. ردیاب شخصی

    THANKS FOR THIS POST.BEST WISHES.
    alireza

    Reply
  9. طراحی سایت در تبریز

    very good.ali bud

    Reply
  10. کابین دوش

    اریان جام تولید ککنده انواع کابین دوش و دوش کابین

    Reply
  11. طراحی سایت در تبریز

    طراحی سایت در تبریز

    Reply
  12. دسر

    دسر

    Reply
  13. آموزش آشپزی

    آموزش آشپزی
    http://www.fa3tfood.com

    Reply
  14. پوکه قروه

    tank you

    Reply
  15. قالب شرکتی

    vre good

    Reply
  16. انجام پایان نامه

    nice tex

    Reply
  17. ترجمه ناتی

    so good

    Reply
  18. تانک ازت

    Appreciate that

    Reply
  19. تحصیل در سوئد

    fine

    Reply
  20. نماشویی

    high-grade

    Reply
  21. ریش تراش برقی

    very well

    Reply
  22. تحصیل در اتریش

    it is so useful

    Reply
  23. آموزش کاشت ناخن

    Like it

    Reply
  24. چاپ بنر

    it is so useful

    Reply
  25. تعمیر دستگاه کپی

    Like it
    AMAZING

    Reply
  26. اجاره انبار در تهران

    top

    Reply
  27. دیزل ژنراتور

    remarkable

    Reply
  28. هدایای تبلیغاتی

    Thank you for useful information.

    Reply
  29. کفسابی

    Thanks

    Reply
  30. هدیه تبلیغاتی

    superlative

    Reply
  31. تولید پت

    grand

    Reply
  32. آیلتس

    Thank you so much

    Reply
  33. اجاره انبار در تهران

    Thank you so much

    Reply
  34. اجاره انبار در تهران

    Thank you for useful information

    Reply
  35. دستگاه تاتو

    high-toned

    Reply
  36. انجام پایان نامه

    Excellent

    Reply
  37. ویزای شینگن

    tnx a lot

    Reply
  38. انجام پایان نامه

    it is so useful

    Reply
  39. ویرایش مقاله انگلیسی

    AMAZING

    Reply
  40. کابینت سازی کرج

    thank you

    Reply
  41. تابلوسازی کرج

    great,so useful thank you

    Reply
  42. اخبار شهرستان مبارکه

    thank you

    Reply
  43. jingermachine

    دستگاه جوش پلی اتیلن

    Reply
  44. سایناتک

    Please Visit our mini pc and thin client by GreenCube Brand name ,
    http://synatech.net/

    Reply
  45. اجاره ویلا

    it is so useful

    Reply
  46. کسب درآمد از اینترنت

    thank you

    Reply
  47. میکروتیک

    like.

    Reply
  48. کتاب وب

    tnxtnx alot dude

    Reply
  49. کتاب زوهر

    nice

    Reply
  50. کتاب وب

    very good

    Reply
  51. Arman

    Amazing article , thanks for great job !

    Reply
  52. pokehmadani

    thanks for you page

    Reply
  53. سیامک همتی

    very good

    Reply
  54. مشاوره طلاق

    nice

    Reply
  55. خرید فلزیاب اصل

    خرید بهترین فلزیابهای روز دنیا

    Reply
  56. hamed

    خرید ویلا سهیلیه کرج کردان چهارباغ لشگرآباد

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

*