Home > Blogs > VMware vFabric Blog > Tag Archives: Hive

Tag Archives: Hive

New Serengeti Release Extends Cloud Computing Support for Hadoop Community

Today VMware is releasing a significant new release of their big data virtualization open source project Serengeti called M4 or version 0.8.0. Designed to help make it easier for Hadoop users to deploy, run and manage mixed workload clusters on a virtualized platform, this release broadens support across the various distributions of the Hadoop community, including new support for Cloudera CDH4, MapR, and HBase. Additionally as part of this release, Serengeti M4, includes updated performance configuration improvements and a hardware reference architecture guide.

This release comes at a perfect time for an exploding data market. This year, worldwide we will create 4 zettabytes of new data, and more than 80% of that will be unstructured data that does not work in a traditional database management system. At the same time, businesses are learning to harness that data and use it to better their business.

A popular strategy to succeed in the data market is Hadoop, an open source data framework that that allows for the massive distributed processing of large data sets across clusters of nodes using simple programming models.  Additionally, Hadoop offers a scalable file system (HDFS) that allows users to store huge amounts of data leveraging inexpensive disks on commodity servers.  The powerful framework has spawned many new startups in Silicon Valley and has Enterprise IT departments clamoring to harness the power of this technology. Huge web applications like Facebook, LinkedIn, Yahoo! and eBay all rely on Hadoop to process and store data for hundreds of millions of users. Continue reading

VMware’s Serengeti – Virtualized Hadoop at Cloud-scale

Not long ago I covered the topic of Big Data adoption in the enterprise. In it, I described how Serengeti enables enterprise to respond to common Hadoop implementation challenges resulting from the lack of usable enterprise-grade tools and the shortage of infrastructure deployment skills.

With the latest release of open source Project Serengeti, VMware continues on its mission to deliver the easiest and most reliable virtualized Big Data platform. One of the most unique attributes of Serengeti Hadoop deployment is that it can easily coexist with other workloads on an existent infrastructure.

Serengeti-deployed Hadoop clusters can also be configured in either local or shared, scale-out data storage architecture. This storage layer can even be shared across multiple HDFS-based analytical workloads. And, in the future, this could potentially be extended to other, non-HDFS-based data engines.

The elasticity of underlining vSphere virtualization platform, helps Serengeti to achieve new levels of efficiency. This architecture enables organizations to share the existing infrastructure with Big Data analytical workloads to deliver optimal storage capacity and performance. Continue reading

Serengeti Helps Enterprise Respond to the Big Data Challenge

Enterprise Demands Analytic Platform

Big Data adoption in the enterprise has traditionally been hindered by the lack of usable enterprise-grade tools and the shortage of implementation skills.

Register for VMworld!
Click Here

Register for Session TEX2183 – Highly Available, Elastic and Multi-Tenant Hadoop on vSphere:
Click Here

Follow all vFabric updates at VMworld on Twitter:
Click Here

Enterprise IT is under immense pressure to deliver a Big Data analytic platform. The majority of this demand is currently for pilot Hadoop implementations, with fewer than 20 nodes, intended to prove its value to deliver new business insight. Gartner predicts that this demand will further increase by 800 percent over the next five years.

The explosive growth of these kinds of requests in mid-to-large size companies renders IT departments unable to that demand. Furthermore, Hadoop, and all of its ecosystem tools, are often too complex to deploy and manage for many of these organizations.

As a result, enterprise users, frustrated by these delays, often opt to circumvent IT, and, go directly to on-line analytic service providers. While satisfied by the immediacy of access, they often compromise many of the corporate data policies, inefficiently proliferate data and accrue large costs due to unpredictable pricing models. Continue reading

Spring and RabbitMQ – Behind India’s 1.2 Billion Person Biometric Database

Aadhaar was conceived as a way to provide a unique, online, portable identity so that every single resident of India can access and benefit from government and private services. The Aadhaar project has received coverage from all possible media – television, press, articles, debates, and the Internet. It is Screen shot 2012-07-30 at 5.53.12 PM seen as audacious use of technology, albeit for a social cause. UIDAI, the authority responsible for issuing Aadhaar numbers, has published white-papers, data, and newsletters on progress of the initiative.A common question to the UIDAI technology team in conferences, events and over coffee is – what technologies power this important nation-wide initiative? In this blog post, we wanted to give a sense of several significant technologies and approaches.

Fundamental Principles

While the deployment footprint of the systems has grown from half-a-dozen machines to a few thousand CPU cores processing millions of Aadhaar related transactions, the fundamental principles have remained the same:

Continue reading