Home > Blogs > VMware vFabric Blog > Tag Archives: Pig

Tag Archives: Pig

New Serengeti Release Extends Cloud Computing Support for Hadoop Community

Today VMware is releasing a significant new release of their big data virtualization open source project Serengeti called M4 or version 0.8.0. Designed to help make it easier for Hadoop users to deploy, run and manage mixed workload clusters on a virtualized platform, this release broadens support across the various distributions of the Hadoop community, including new support for Cloudera CDH4, MapR, and HBase. Additionally as part of this release, Serengeti M4, includes updated performance configuration improvements and a hardware reference architecture guide.

This release comes at a perfect time for an exploding data market. This year, worldwide we will create 4 zettabytes of new data, and more than 80% of that will be unstructured data that does not work in a traditional database management system. At the same time, businesses are learning to harness that data and use it to better their business.

A popular strategy to succeed in the data market is Hadoop, an open source data framework that that allows for the massive distributed processing of large data sets across clusters of nodes using simple programming models.  Additionally, Hadoop offers a scalable file system (HDFS) that allows users to store huge amounts of data leveraging inexpensive disks on commodity servers.  The powerful framework has spawned many new startups in Silicon Valley and has Enterprise IT departments clamoring to harness the power of this technology. Huge web applications like Facebook, LinkedIn, Yahoo! and eBay all rely on Hadoop to process and store data for hundreds of millions of users. Continue reading

Serengeti Helps Enterprise Respond to the Big Data Challenge

Enterprise Demands Analytic Platform

Big Data adoption in the enterprise has traditionally been hindered by the lack of usable enterprise-grade tools and the shortage of implementation skills.

Register for VMworld!
Click Here

Register for Session TEX2183 – Highly Available, Elastic and Multi-Tenant Hadoop on vSphere:
Click Here

Follow all vFabric updates at VMworld on Twitter:
Click Here

Enterprise IT is under immense pressure to deliver a Big Data analytic platform. The majority of this demand is currently for pilot Hadoop implementations, with fewer than 20 nodes, intended to prove its value to deliver new business insight. Gartner predicts that this demand will further increase by 800 percent over the next five years.

The explosive growth of these kinds of requests in mid-to-large size companies renders IT departments unable to that demand. Furthermore, Hadoop, and all of its ecosystem tools, are often too complex to deploy and manage for many of these organizations.

As a result, enterprise users, frustrated by these delays, often opt to circumvent IT, and, go directly to on-line analytic service providers. While satisfied by the immediacy of access, they often compromise many of the corporate data policies, inefficiently proliferate data and accrue large costs due to unpredictable pricing models. Continue reading

Spring and RabbitMQ – Behind India’s 1.2 Billion Person Biometric Database

Aadhaar was conceived as a way to provide a unique, online, portable identity so that every single resident of India can access and benefit from government and private services. The Aadhaar project has received coverage from all possible media – television, press, articles, debates, and the Internet. It is Screen shot 2012-07-30 at 5.53.12 PM seen as audacious use of technology, albeit for a social cause. UIDAI, the authority responsible for issuing Aadhaar numbers, has published white-papers, data, and newsletters on progress of the initiative.A common question to the UIDAI technology team in conferences, events and over coffee is – what technologies power this important nation-wide initiative? In this blog post, we wanted to give a sense of several significant technologies and approaches.

Fundamental Principles

While the deployment footprint of the systems has grown from half-a-dozen machines to a few thousand CPU cores processing millions of Aadhaar related transactions, the fundamental principles have remained the same:

Continue reading