Today VMware is releasing a significant new release of their big data virtualization open source project Serengeti called M4 or version 0.8.0. Designed to help make it easier for Hadoop users to deploy, run and manage mixed workload clusters on a virtualized platform, this release broadens support across the various distributions of the Hadoop community, including new support for Cloudera CDH4, MapR, and HBase. Additionally as part of this release, Serengeti M4, includes updated performance configuration improvements and a hardware reference architecture guide.
This release comes at a perfect time for an exploding data market. This year, worldwide we will create 4 zettabytes of new data, and more than 80% of that will be unstructured data that does not work in a traditional database management system. At the same time, businesses are learning to harness that data and use it to better their business.
A popular strategy to succeed in the data market is Hadoop, an open source data framework that that allows for the massive distributed processing of large data sets across clusters of nodes using simple programming models. Additionally, Hadoop offers a scalable file system (HDFS) that allows users to store huge amounts of data leveraging inexpensive disks on commodity servers. The powerful framework has spawned many new startups in Silicon Valley and has Enterprise IT departments clamoring to harness the power of this technology. Huge web applications like Facebook, LinkedIn, Yahoo! and eBay all rely on Hadoop to process and store data for hundreds of millions of users. Continue reading →
Virtualization continues to be one of the top priorities for CIOs. As the share of virtualized workloads approaches 60%, the enterprise is looking at database and big data workloads as the next target. Their goal is to realize the virtualization benefits with the plethora of relational database sprawling in their data centers. With the increasing popularity of analytic workloads on Hadoop, virtualization presents a fast and efficient way to get started with existing infrastructure, and scale the data dynamically as needed.
VMware’s vFabric Data Director 2.5 now extends the benefits of virtualization to both traditional relational databases like Oracle, SQL Server and Postgres as well as Big Data, multi-node data solutions like Hadoop. SQL Server and Oracle represent the majority of databases in enterprises, and, Hadoop is the one of the fastest growing data technologies in the enterprise.
vFabric Data Director enables the most common databases found in the enterprise to be delivered as a service with the agility of public cloud and enterprise-grade security and control.
The key new features in vFabric Data Director 2.5 are:
Support for SQL Server – Currently supported versions of SQL Server are 2008 R2 and 2012.
Support for Apache Hadoop 1.0-based distributions: Apache Hadoop 1.0, Cloudera CDH3, Greenplum HD 1.1, 1.2 and Hortonworks HDP-1. Data Director leverages VMware’s open source Project Serengeti to deliver this capability.
Streamlined Data Director Setup – Complete setup in in less than an hour
One-click template creation for Oracle and SQL Server through ISO based database and OS installation
Oracle database ingestion enhancements – Now includes Point In Time Refresh (PITR)
Data Director’s self-provisioning enables a whole new level of operational efficiencies that greatly accelerates application development. With this new release, Data Director now delivers these efficiencies in a heterogeneous database environment.
Though my background includes time as both a developer, architect, and CTO, much of my time today is spent discussing applications with senior IT executives. I manage an application development division of a national VAR and focus on the vFabric stack from top to bottom. One of the challenges I face is trying
to provide application-centric consulting services to operations/infrastructure teams who (a) don’t really own the decision of app software infrastructure and/or (b) don’t understand it and, (c) worse in some cases, don’t care. Recently, I’ve come to love my job for two primary reasons:
1. “Cloud” technologies are forcing the Operations teams and the Application teams to “share” responsibility for overall IT efficiency. The cloud concept of an on-demand, elastic infrastructure is knocking down political walls and silos that have evolved over the past decades in IT. This is no more evident than at VMWare, where vFabric and vSphere product lines are starting to blur (e.g. vCenter –> vCloud Director –> Application Director). Finally, I have something to talk to the Infrastructure folks that gets them excited! Perhaps it is the needed automation of infrastructure that brings Ops to the Aps side. Or, perhaps it an elastic architecture that brings Aps over to the Ops side. In any event, the two teams are brought together and work together more in cloud solutions.