Today VMware is releasing a significant new release of their big data virtualization open source project Serengeti called M4 or version 0.8.0. Designed to help make it easier for Hadoop users to deploy, run and manage mixed workload clusters on a virtualized platform, this release broadens support across the various distributions of the Hadoop community, including new support for Cloudera CDH4, MapR, and HBase. Additionally as part of this release, Serengeti M4, includes updated performance configuration improvements and a hardware reference architecture guide.
This release comes at a perfect time for an exploding data market. This year, worldwide we will create 4 zettabytes of new data, and more than 80% of that will be unstructured data that does not work in a traditional database management system. At the same time, businesses are learning to harness that data and use it to better their business.
A popular strategy to succeed in the data market is Hadoop, an open source data framework that that allows for the massive distributed processing of large data sets across clusters of nodes using simple programming models. Additionally, Hadoop offers a scalable file system (HDFS) that allows users to store huge amounts of data leveraging inexpensive disks on commodity servers. The powerful framework has spawned many new startups in Silicon Valley and has Enterprise IT departments clamoring to harness the power of this technology. Huge web applications like Facebook, LinkedIn, Yahoo! and eBay all rely on Hadoop to process and store data for hundreds of millions of users. Continue reading →
Though my background includes time as both a developer, architect, and CTO, much of my time today is spent discussing applications with senior IT executives. I manage an application development division of a national VAR and focus on the vFabric stack from top to bottom. One of the challenges I face is trying
to provide application-centric consulting services to operations/infrastructure teams who (a) don’t really own the decision of app software infrastructure and/or (b) don’t understand it and, (c) worse in some cases, don’t care. Recently, I’ve come to love my job for two primary reasons:
1. “Cloud” technologies are forcing the Operations teams and the Application teams to “share” responsibility for overall IT efficiency. The cloud concept of an on-demand, elastic infrastructure is knocking down political walls and silos that have evolved over the past decades in IT. This is no more evident than at VMWare, where vFabric and vSphere product lines are starting to blur (e.g. vCenter –> vCloud Director –> Application Director). Finally, I have something to talk to the Infrastructure folks that gets them excited! Perhaps it is the needed automation of infrastructure that brings Ops to the Aps side. Or, perhaps it an elastic architecture that brings Aps over to the Ops side. In any event, the two teams are brought together and work together more in cloud solutions.
Aadhaar was conceived as a way to provide a unique, online, portable identity so that every single resident of India can access and benefit from government and private services. The Aadhaar project has received coverage from all possible media – television, press, articles, debates, and the Internet. It is seen as audacious use of technology, albeit for a social cause. UIDAI, the authority responsible for issuing Aadhaar numbers, has published white-papers, data, and newsletters on progress of the initiative.A common question to the UIDAI technology team in conferences, events and over coffee is – what technologies power this important nation-wide initiative? In this blog post, we wanted to give a sense of several significant technologies and approaches.
While the deployment footprint of the systems has grown from half-a-dozen machines to a few thousand CPU cores processing millions of Aadhaar related transactions, the fundamental principles have remained the same: