Today VMware is releasing a significant new release of their big data virtualization open source project Serengeti called M4 or version 0.8.0. Designed to help make it easier for Hadoop users to deploy, run and manage mixed workload clusters on a virtualized platform, this release broadens support across the various distributions of the Hadoop community, including new support for Cloudera CDH4, MapR, and HBase. Additionally as part of this release, Serengeti M4, includes updated performance configuration improvements and a hardware reference architecture guide.
This release comes at a perfect time for an exploding data market. This year, worldwide we will create 4 zettabytes of new data, and more than 80% of that will be unstructured data that does not work in a traditional database management system. At the same time, businesses are learning to harness that data and use it to better their business.
A popular strategy to succeed in the data market is Hadoop, an open source data framework that that allows for the massive distributed processing of large data sets across clusters of nodes using simple programming models. Additionally, Hadoop offers a scalable file system (HDFS) that allows users to store huge amounts of data leveraging inexpensive disks on commodity servers. The powerful framework has spawned many new startups in Silicon Valley and has Enterprise IT departments clamoring to harness the power of this technology. Huge web applications like Facebook, LinkedIn, Yahoo! and eBay all rely on Hadoop to process and store data for hundreds of millions of users. Continue reading →
As we’ve previously covered, data growth is quite unbelievable, and this means traditional database models are being stretched. On Tuesday, November 13, 2012 at 9:00 AM PST, VMware’s Joe Russell will be presenting on several topics related to Big, Fast, Flexible Data and how VMWare’s key data management technologies help companies overcome some of the key challenges with traditional RDBMS.
Attend to learn:
How Hadoop and new analytics technologies are allowing companies to use Big Data in new ways to gain meaningful business insights
What’s new with Project Serengeti, a VMware initiative to help you deploy and manage elastic Hadoop clusters in minutes
How Fast Data is bringing data logic in-memory, allowing for dramatic scale, reduced costs, and improved performance
How Flexibile Data, includng NoSQL and open source relational data technologies can improve your data model
How virtualizing the database layer enables a new Cloud Delivery Model, allowing enterprise IT departments to offer self-service data services elastically on demand, maintain centralized control, and operate within regulatory guidelines
Virtualization continues to be one of the top priorities for CIOs. As the share of virtualized workloads approaches 60%, the enterprise is looking at database and big data workloads as the next target. Their goal is to realize the virtualization benefits with the plethora of relational database sprawling in their data centers. With the increasing popularity of analytic workloads on Hadoop, virtualization presents a fast and efficient way to get started with existing infrastructure, and scale the data dynamically as needed.
VMware’s vFabric Data Director 2.5 now extends the benefits of virtualization to both traditional relational databases like Oracle, SQL Server and Postgres as well as Big Data, multi-node data solutions like Hadoop. SQL Server and Oracle represent the majority of databases in enterprises, and, Hadoop is the one of the fastest growing data technologies in the enterprise.
vFabric Data Director enables the most common databases found in the enterprise to be delivered as a service with the agility of public cloud and enterprise-grade security and control.
The key new features in vFabric Data Director 2.5 are:
Support for SQL Server – Currently supported versions of SQL Server are 2008 R2 and 2012.
Support for Apache Hadoop 1.0-based distributions: Apache Hadoop 1.0, Cloudera CDH3, Greenplum HD 1.1, 1.2 and Hortonworks HDP-1. Data Director leverages VMware’s open source Project Serengeti to deliver this capability.
Streamlined Data Director Setup – Complete setup in in less than an hour
One-click template creation for Oracle and SQL Server through ISO based database and OS installation
Oracle database ingestion enhancements – Now includes Point In Time Refresh (PITR)
Data Director’s self-provisioning enables a whole new level of operational efficiencies that greatly accelerates application development. With this new release, Data Director now delivers these efficiencies in a heterogeneous database environment.
Not long ago I covered the topic of Big Data adoption in the enterprise. In it, I described how Serengeti enables enterprise to respond to common Hadoop implementation challenges resulting from the lack of usable enterprise-grade tools and the shortage of infrastructure deployment skills.
With the latest release of open source Project Serengeti, VMware continues on its mission to deliver the easiest and most reliable virtualized Big Data platform. One of the most unique attributes of Serengeti Hadoop deployment is that it can easily coexist with other workloads on an existent infrastructure.
Serengeti-deployed Hadoop clusters can also be configured in either local or shared, scale-out data storage architecture. This storage layer can even be shared across multiple HDFS-based analytical workloads. And, in the future, this could potentially be extended to other, non-HDFS-based data engines.
The elasticity of underlining vSphere virtualization platform, helps Serengeti to achieve new levels of efficiency. This architecture enables organizations to share the existing infrastructure with Big Data analytical workloads to deliver optimal storage capacity and performance. Continue reading →
The vFabric team is headed to SpringOne 2GX 2012 next week – from October 15-18 in Washington, DC. This is set to be a great event to learn the latest on Spring with over 100 sessions covering a wide variety of topics. For those of you looking to learn more about how vFabric is the best place to run Spring applications, here are the highlights you won’t want to miss:
Register for Session TEX2183 – Highly Available, Elastic and Multi-Tenant Hadoop on vSphere: Click Here
Follow all vFabric updates at VMworld on Twitter: Click Here
Enterprise IT is under immense pressure to deliver a Big Data analytic platform. The majority of this demand is currently for pilot Hadoop implementations, with fewer than 20 nodes, intended to prove its value to deliver new business insight. Gartner predicts that this demand will further increase by 800 percent over the next five years.
The explosive growth of these kinds of requests in mid-to-large size companies renders IT departments unable to that demand. Furthermore, Hadoop, and all of its ecosystem tools, are often too complex to deploy and manage for many of these organizations.
As a result, enterprise users, frustrated by these delays, often opt to circumvent IT, and, go directly to on-line analytic service providers. While satisfied by the immediacy of access, they often compromise many of the corporate data policies, inefficiently proliferate data and accrue large costs due to unpredictable pricing models. Continue reading →
On Monday morning, I had the opportunity to sit back and enjoy the opening keynotes with Paul Maritz, Pat Gelsinger, and Steven Herrod at VMworld 2012. Since my efforts focus on the vFabric product line, I was quite excited to see how our executive leadership team announced the company’s vision and hit on where vFabric fits in. For those that missed the keynote, it is available here. First, I’d like to say how amazing it was to hear Paul Maritz talk about how much virtualization has been adopted during his short tenure since 2008.
Now, there were three points made in the keynotes which explain how vFabric is a key part of the software-defined data center story, and I thought they were worth passing along to anyone that missed them. Before I mention these points, it makes sense to summarize the relationship between vFabric and the software-defined data center at a very high level. To do so, I will quote Steve Herrod in this software-defined datacenter overview:
“So, in the end, it is the applications that matter. It’s the applications that help a business make new revenue or be more efficient in how they are doing so. And Continue reading →
Our “Uber” Data booth at VMworld this year will demonstrate how VMware continues to address enterprise data management challenges related to scalability, data proliferation, traditional database performance bottlenecks, analytics, and the ever changing data usage patterns of today’s on-line applications.