Home > Blogs > VMware vFabric Blog > Category Archives: Serengeti

Category Archives: Serengeti

Special Strategy Sessions at Hadoop Summit

Hadoop_Summit_HeaderPlain and simple: Apache Hadoop has become the technology disrupter that is sending every enterprise into overdrive to get up to speed on and figure out how to exploit their data. Adoption is accelerating at 60% a year, yet 26% of the most sophisticated Hadoop users say that the time it takes to put Hadoop into production is gating its success.

From the agenda on this year’s Hadoop Summit in San Jose on June 26 & 27th, it looks like the industry is primed to fix this issue. This year, it is one of the first Hadoop/Big Data conferences that is supporting a full infrastructure track. VMware is also serious about this too, but we need your help—we need to meet you there!

Strategy Feedback Sessions

VMware’s big data experts, along with colleagues such as EMC’s Chuck Hollis, will be at the conference running a series of strategy feedback sessions concentrating on how extending virtualization will meet tomorrow’s requirements for big data analytics environments. We’d very much like to have you participate—and who knows, you may help shape the very future of Hadoop in big data web applications.

These 90 minute sessions will be run as small groups throughout the conference and will allow you to meet some of our top minds on how Hadoop will transform itself to seize the cloud. We’ll share with you some of what we see happening with a shift to make Hadoop more on-demand in the cloud, and some of our enabling technologies such as Serengeti and Hadoop Virtual Extensions (HVE). For your part of these sessions, we will concentrate on questions like: Continue reading

Cloudera Gets More Cloudy: Partners and Certifies CDH4 on vSphere

Today, we are excited to welcome Cloudera officially to the VMware family. VMware and Cloudera have entered into a partnership agreement that is meant to help users of Cloudera’s Hadoop distribution, CDH4, to run in the cloud. As part of this announcement, VMware has tested and certified Cloudera’s Enterprise Big Data software to run on vSphere 5.1 and that Cloudera is now part of the VMware Ready and Technical Alliances Partner (TAP) program.

This month at EMC World, VMware CEO Pat Gelsinger stated that over 500,000 Hadoop installations exist today on bare metal servers, with compute and data tied to the same physical server. By breaking compute and data apart, and putting it on fast-to-deploy vSphere virtual machines, big data becomes inherently more accessible, compute times can improve by up to 13%, and datacenters can optimize to provide more types of data services without adding more hardware.

It comes at a time where both the volume of data is exploding and, according to PwC’s 5th Annual Digital IQ Survey,  83% of their top performing companies believe that harnessing Big Data will give their firms a competitive advantage. As such, many CIOs are formally aligning their agenda to invest in big data this year. Continue reading

Breaking the Mindset: Why Hadoop Can and Should Move Past Bare-Metal Deployments to Virtualization

Whenever we’ve dealt with something for a while, our way of thinking about it becomes a habit. Hadoop deals with a lot of data. Currently, the record is 100 petabytes in a Facebook cluster that analyzes log data.  Since it was built by the likes of Google and Facebook to deal with such large data volumes and performance, it originally was built to run on bare-metal servers. Since it wasn’t an option from the get-go, the notion that you can’t have that much data running on a move-able virtual machine safely has largely gone unchallenged.

However, as time has gone on, and technology has allowed for persistent storage on the cloud, organizations have started to rethink this paradigm. In fact, several companies are using Hadoop and big data today to gain competitive advantage. And while they are running it on virtualization, they are not moving the data. There are other advantages.

VMware’s Big Data product line marketing manager Joe Russell, spoke with Roberto Zicari this week in an interview on ODBMS.org that helps articulate why Hadoop not only can run on virtual infrastructure using Project Serengeti, but why companies should consider it to save time and make Hadoop more usable. Continue reading

7 Myths on Big Data—Avoiding Bad Hadoop and Cloud Analytics Decisions

Hadoop is an open source legend built by software heroes.

Yet, legends can sometimes be surrounded by myths—these myths can lead IT executives down a path with rose-colored glasses.

Data and data usage is growing at an alarming rate.  Just look at all the numbers from analysts—IDC predicts a 53.4% growth rate for storage this year, AT&T claims 20,000% growth of their wireless data traffic over the past 5 years, and if you take at your own communications channels, its guaranteed that the internet content, emails, app notifications, social messages, and automated reports you get every day has dramatically increased.  This is why companies ranging from McKinsey to Facebook to Walmart are doing something about big data.

Just like we saw in the dot-com boom of the 90s and the web 2.0 boom of the 2000s, the big data trend will also lead companies to make some really bad assumptions and decisions.

Hadoop is certainly one major area of investment for companies to use to solve big data needs. Companies like Facebook that have famously dealt well with large data volumes have publicly touted their successes with Hadoop, so its natural that companies approaching big data first look to the successes of others.  A really smart MIT computer science grad once told me, “when all you have is a hammer, everything looks like a nail.” This functional fixedness is the cognitive bias to avoid with the hype surrounding Hadoop. Hadoop is a multi-dimensional solution that can be deployed and used in different way. Let’s look at some of the most common pre-concieved notions about Hadoop and big data that companies should know before committing to a Hadoop project: Continue reading

10 Ways to Make Hadoop Green in the CFO’s Eyes

Hadoop is used by some pretty amazing companies to make use of big, fast data—particularly unstructured data. Huge brands on the web like AOL, eBay, Facebook, Google, Last.fm, LinkedIn, MercadoLibre, Ning, Quantcast, Spotify, Stumbleupon, Twitter, as well as some more brick and mortar giants like GE, Walmart, Morgan Stanley, Sears, and Ford use Hadoop.

Why? In a nutshell, companies like McKinsey believe the use of big data and technologies like Hadoop will allow companies to better compete and grow in the future.

Hadoop is used to support a variety of valuable business capabilities—analysis, search, machine learning, data aggregation, content generation, reporting, integration, and more. All types of industries use Hadoop—media and advertising, A/V processing, credit and fraud, security, geographic exploration, online travel, financial analysis, mobile phones, sensor networks, e-commerce, retail, energy discovery, video games, social media, and more. Continue reading

New Serengeti Release Extends Cloud Computing Support for Hadoop Community

Today VMware is releasing a significant new release of their big data virtualization open source project Serengeti called M4 or version 0.8.0. Designed to help make it easier for Hadoop users to deploy, run and manage mixed workload clusters on a virtualized platform, this release broadens support across the various distributions of the Hadoop community, including new support for Cloudera CDH4, MapR, and HBase. Additionally as part of this release, Serengeti M4, includes updated performance configuration improvements and a hardware reference architecture guide.

This release comes at a perfect time for an exploding data market. This year, worldwide we will create 4 zettabytes of new data, and more than 80% of that will be unstructured data that does not work in a traditional database management system. At the same time, businesses are learning to harness that data and use it to better their business.

A popular strategy to succeed in the data market is Hadoop, an open source data framework that that allows for the massive distributed processing of large data sets across clusters of nodes using simple programming models.  Additionally, Hadoop offers a scalable file system (HDFS) that allows users to store huge amounts of data leveraging inexpensive disks on commodity servers.  The powerful framework has spawned many new startups in Silicon Valley and has Enterprise IT departments clamoring to harness the power of this technology. Huge web applications like Facebook, LinkedIn, Yahoo! and eBay all rely on Hadoop to process and store data for hundreds of millions of users. Continue reading

10 Lessons from Spring Applied to Java Virtualization with vFabric

The Spring Framework became the de-facto standard for developing enterprise Java applications, and its radical simplicity was fundamental to its success. Why the “radical” simplicity? Because at the time, it was hard to imagine how creating such applications could be made simple.

By tackling issues such as portability, understanding the importance of cross-cutting concerns, and making it trivial to develop automated tests, Spring allowed developers to focus on what matters: what makes their application unique.

As I was pulling together my presentation for SpringOne2GX 2012, I reflected on the parallels between Spring’s success and the direction we were going with EM4J. Why did Spring succeed? Why did simplification win? Where are we replicating these patterns within VMware, vFabric, and Java?

In short, complexity is expensive, and simplification has many economic benefits. By giving people better, simpler, and easier to use tools to help build, run, and manage applications, we create economic advantages.

In a nutshell, there are some core reasons why Spring succeeded, “Spring values” if you will: Reducing complexity, increasing productivity, provisioning flexibility, tooling and monitoring, extensibility, automation, flexible integration and ease of testing. Continue reading

Join Us at Strata – Feb 26-28 in Santa Clara

The vFabric and Greenplum teams will be at Strata on Feb 26-28 at the Santa Clara Convention Center.

While the Pivotal Initiative is forming, both vFabric and Greenplum groups will be represented separately. Of course, you can also learn what’s going on by checking out Strata Greenplum or Strata VMware on Twitter.

If you aren’t familiar with Strata, it is a great conference for those building apps in the cloud. Its focus is all about the future of big data and how to use big data successfully. Speakers include representatives from Google, VMware, Amazon, Microsoft, and many other software companies focused in the big data space. Topics include: Continue reading

5 Characteristics of a Modern Mainframe Cloud App – Avoid Tornado IT

No one likes being rushed into bad decisions.

Yet, the pace of information technology often forces IT executives to do that.

In today’s world, mainframe-to-cloud decisions need solid thinking or we risk a technology tornado. This article outlines some key lessons learned at the front-line of IT decision-making.

As previously discussed, it’s possible to “modernize” mainframe legacy applications to the cloud. You can get there with little to no modification by using a “lift-and-shift” strategy.  Several of my clients have taken this approach to quickly satisfy a “cloud mandate”. The results have been less than desirable:

  • Without the use of pooled resources, the applications do not scale well.
  • Timely user provisioning and access from any device is still a challenge because the apps do not provide on-demand, ubiquitous access.
  • In addition, utility-based pricing/costing is performed manually, with little accuracy to the realities of actual usage.
  • Most importantly, the applications continue to have monolithic, stove-piped architectures, which are difficult and expensive to maintain and enhance.

These “cloud” applications are more like funnel cloud apps or tornoado apps—waiting to cause IT organizations extreme havoc. Assuming you want to avoid funnel clouds and IT tornadoes, consider applying the following five application architecture and design principles indicative of a true cloud application: Continue reading

The Best VMware vFabric Stories of 2012 & What’s In Store for 2013

As this year comes to a close, it’s time to be reflective of what happened in the past and start planning for a new year. The vFabric team has had some major achievements this year, introducing several new products to the market including the innovative vFabric Application Director, the widely anticipated Project Serengeti to enable rapid cloud deployments for Hadoop, and a new tool to the vFabric Suite users called vFabric Administration Server (VAS).  We announced a new VMware Cloud Applications Marketplace to help further accelerate application development with a professionally moderated library of enterprise grade, ready-to-use application components that can be run on any cloud.

Next year is going to be even bigger with the Pivotal Initiative where several of the products covered on this blog will be following the new venture. This is still in the planning stages, so we will be expecting to share with you the plans for our products alongside the formal communications from each of the companies involved. (Sorry — no extra information is available right now)

One thing that we are going to be doing in early 2013 is to move the conversation of how you manage applications to be with the conversations of how you manage virtual infrastructure. To that end, we will be moving all topics of Application Performance Manager, AppInsight, Application Director, Hyperic, and Spring Insight to the VMware Management Blog as of January 1st. To make sure you keep up with the management topics, please be sure to follow us @vmwareappmgmt and @vmwaremgmt.

In the meantime, we’d like to reshare with you the top 20 stories we had for 2012, and invite you to comment here on what stories you would like to see us cover on either blog for 2013.

Continue reading