Home > Blogs > VMware vFabric Blog > Tag Archives: big data

Tag Archives: big data

Breaking the Mindset: Why Hadoop Can and Should Move Past Bare-Metal Deployments to Virtualization

Whenever we’ve dealt with something for a while, our way of thinking about it becomes a habit. Hadoop deals with a lot of data. Currently, the record is 100 petabytes in a Facebook cluster that analyzes log data.  Since it was built by the likes of Google and Facebook to deal with such large data volumes and performance, it originally was built to run on bare-metal servers. Since it wasn’t an option from the get-go, the notion that you can’t have that much data running on a move-able virtual machine safely has largely gone unchallenged.

However, as time has gone on, and technology has allowed for persistent storage on the cloud, organizations have started to rethink this paradigm. In fact, several companies are using Hadoop and big data today to gain competitive advantage. And while they are running it on virtualization, they are not moving the data. There are other advantages.

VMware’s Big Data product line marketing manager Joe Russell, spoke with Roberto Zicari this week in an interview on ODBMS.org that helps articulate why Hadoop not only can run on virtual infrastructure using Project Serengeti, but why companies should consider it to save time and make Hadoop more usable. Continue reading

7 Myths on Big Data—Avoiding Bad Hadoop and Cloud Analytics Decisions

Hadoop is an open source legend built by software heroes.

Yet, legends can sometimes be surrounded by myths—these myths can lead IT executives down a path with rose-colored glasses.

Data and data usage is growing at an alarming rate.  Just look at all the numbers from analysts—IDC predicts a 53.4% growth rate for storage this year, AT&T claims 20,000% growth of their wireless data traffic over the past 5 years, and if you take at your own communications channels, its guaranteed that the internet content, emails, app notifications, social messages, and automated reports you get every day has dramatically increased.  This is why companies ranging from McKinsey to Facebook to Walmart are doing something about big data.

Just like we saw in the dot-com boom of the 90s and the web 2.0 boom of the 2000s, the big data trend will also lead companies to make some really bad assumptions and decisions.

Hadoop is certainly one major area of investment for companies to use to solve big data needs. Companies like Facebook that have famously dealt well with large data volumes have publicly touted their successes with Hadoop, so its natural that companies approaching big data first look to the successes of others.  A really smart MIT computer science grad once told me, “when all you have is a hammer, everything looks like a nail.” This functional fixedness is the cognitive bias to avoid with the hype surrounding Hadoop. Hadoop is a multi-dimensional solution that can be deployed and used in different way. Let’s look at some of the most common pre-concieved notions about Hadoop and big data that companies should know before committing to a Hadoop project: Continue reading

Q&A with Shay Banon: 10 “Bonsai Cool” Things About elasticsearch

We are very fortunate to post an interview with Shay Banon, the founder of elasticsearch. Elasticsearch is technology that is very popular among some of the coolest companies on the web today, including  SoundCloud, StumbleUpon, Mozilla and Klout. These companies use elasticsearch to help them deploy powerful search capabilities in their applications that are easy to set up, scalable and built for the cloud.  In this interview, we get to learn all kinds of cool things:

  1. How Shay got into search
  2. How he came up with the idea for elasticsearch
  3. Why elastic search is different than other OSS search projects
  4. Example elasticsearch users like Foursquare, Brewster, GitHub, Sony, and Klout
  5. About the elasticsearch architecture for big data
  6. The strategy behind JSON over HTTP for search
  7. Connecting elasticsearch with RabbitMQ
  8. Connecting elasticsearch with Spring
  9. Connecting elasticsearch with GemFire
  10. Running elasticsearch on virtualized infrastructure

Without further ado, here is the interview.

Q1. So, how did you end up getting into search?
About 10 years ago, I moved from Israel to London because my wife was going to study to be a chef at the Cordon Bleu. I had no job. I was in a new country. I was unemployed. So, I started to get into the latest, cool, new technologies. Continue reading

vFabric @ SpringOne Next Week

The vFabric team is headed to SpringOne 2GX 2012 next week – from October 15-18 in Washington, DC. This is set to be a great event to learn the latest on Spring with over 100 sessions covering a wide variety of topics. For those of you looking to learn more about how vFabric is the best place to run Spring applications, here are the highlights you won’t want to miss:

1. Sessions:  There are a number of speakers from SpringSource, CloudFoundry, and the VMware vFabric team on the schedule, including:

Continue reading

Serengeti Helps Enterprise Respond to the Big Data Challenge

Enterprise Demands Analytic Platform

Big Data adoption in the enterprise has traditionally been hindered by the lack of usable enterprise-grade tools and the shortage of implementation skills.

Register for VMworld!
Click Here

Register for Session TEX2183 – Highly Available, Elastic and Multi-Tenant Hadoop on vSphere:
Click Here

Follow all vFabric updates at VMworld on Twitter:
Click Here

Enterprise IT is under immense pressure to deliver a Big Data analytic platform. The majority of this demand is currently for pilot Hadoop implementations, with fewer than 20 nodes, intended to prove its value to deliver new business insight. Gartner predicts that this demand will further increase by 800 percent over the next five years.

The explosive growth of these kinds of requests in mid-to-large size companies renders IT departments unable to that demand. Furthermore, Hadoop, and all of its ecosystem tools, are often too complex to deploy and manage for many of these organizations.

As a result, enterprise users, frustrated by these delays, often opt to circumvent IT, and, go directly to on-line analytic service providers. While satisfied by the immediacy of access, they often compromise many of the corporate data policies, inefficiently proliferate data and accrue large costs due to unpredictable pricing models. Continue reading

VMware Expert Q&A: Big Data Analytics

One of the most popular topics at this year’s VMworld is Big Data Analytics.  We had an opportunity to catch up with VMware’s Senior Director of Big Data Analytics, Karthik Kannan to ask a few questions about this topic.

1. Why is big data analytics different than traditional analytics?
For one, dimensions are much larger.  For example, when you look at data from multiple sources, there are additional ways to both combine the data and filter it.  This isn’t just volume.  Data comes from a wider set of sources like devices.  Data is created at faster speeds like a terabyte per day.  Data can change rapidly like in financial markets or on social networks.  Traditional analytics are more about regular-interval reports on data that doesn’t change much.  For example, “Weekly Accounts Receivable” is much more static in terms of the structure, schema, sources, and the data itself.

Continue reading

Top 5 Reasons App Teams Should Come to VMworld

VMware is excited about the upcoming VMworld 2012 in San Francisco on August 26 through 30. The VMworld team published the Top 5 Reasons Why You Should Come to VMworld, including getting the inside information on what’s next, hands-on training, and meeting the industry luminaries that this event attracts. However, some may not glean from this list that VMworld is aimed not just at virtualization infrastructure teams, but at app teams as well. While VMware is the leader in virtualization and cloud technologies, we are also squarely focused on helping customers build apps that are optimized to run on this infrastructure, and the vFabric team’s presence at VMworld this year underscores this fact in spades.

Why should app teams come to VMworld?

Learn More

Register for VMworld!
Click Here

Register for Customer Roundtables & 1-on-1s
Current customers contact Charles Lee

Follow all vFabric updates at VMworld on Twitter:
Click Here

1. Knowledge. If you develop with Spring or intend to virtualize Java, you will gain enough insights on VMware’s vFabric application stack to impact your career for several years.

2. Customer roundtables. We are looking to bring together a variety of vFabric users with their peers and our product leaders to hear and provide feedback on our product roadmaps, use cases, and suggestions that will help define our upcoming products.

3. One-on-one interviews and testimonials. We want to hear from you! Meet with our product representatives and tell us your vFabric experience. Sharing your achievements adds to the community, builds respect with your peers, and can even earn you some special rewards from VMware itself.

Continue reading

VMware on How Big Data Meets Fast Data in the Cloud

Learn More

 

Register for VMworld!
Click Here

Register for Session APP-CAP1250 Fast Data Meets Big Data:
Click Here

Follow all vFabric updates at VMworld on Twitter:
Click Here

Big Data allows you to find opportunities you didn’t know you had.

Fast Data allows you to respond to opportunities before they’re gone.

The combination of Big Data and Fast Data working together may enable new business models you never could have achieved before.

To elaborate, the idea here is to analyze the historical “Big Data” and look for trends or patterns that have lead to good results in the past. Then you try to model those patterns in such a way that you can detect them as they are unfolding in real-time based on incoming Fast Data. If we could analyze large amounts historical as well as recent data quickly enough, we have an opportunity to influence the behaviors of the actors real-time, and have a better chance to steer them toward the patterns that produce results.

Some Examples of Big, Fast Data

For instance, look at location-based services.

Continue reading

vFabric @GigaOM Structure | SanFran | June 20-21 #structureconf

If you are attending GigaOM Structure in San Francisco, VMware’s CTO and the VP of Emerging Products are both speaking Here are summaries:

Steve Herrod, CTO and SVP of R&D – Where Big Data Meets the Cloud

EMC’s reasoning for the acquisition of VMware has always been questioned. Was it because of the impact virtualization was having on storage, or was it something else? In this conversation with VMware’s technology leader, we will get a better understanding of the real impact of big data when it meets cloud computing. Some deep insights from customer deployments combined with industry announcements make this a must-attend panel.

Chris Keene, VP of Emerging Products at VMware and Tom Roloff, COO at EMC – Building Killer Apps with Big Data

Big data fast data architecture gives IT an opportunity to change the competitive landscape by helping companies make faster and better decisions. This architecture combines in-memory data management with business intelligence to develop big data applications. Join Chris Keene, VP Emerging Products and gain insight into the transformative value of big data fast data.

>> To get demonstrations, learn more about vFabric, meet experts, or even if you just want giveaways, come by the booth.

Comparing Traditional databases to SQLFire for Cloud-scale Applications

Learn More

Register now for Mark Chmarny’s Webcast.

Date: Wednesday, June 20, 2012 @ 9:00 AM PDT

Title: vFabric SQLFire – Fast Data that Spans the Globe

Register or View

Despite what people tell you, managing on-line applications on a cloud-scale is hard. One of the main challenges is related to the fact that as an application gets more and more popular, the underlining database often becomes the bottleneck.

When demand spikes, organizations are comfortable scaling their Web and App Server layers. However, as they increase the number of application instances to accommodate the growing demand, their data layer is unable to keep up.

We all know that a solution’s overall performance is only as good as it’s lowest common denominator. Increasingly, the lowest common denominator of today’s on-line applications is the database.

A Customer Example

Recently, a large retail customer spoke to us about their experiences in dealing with demand spikes during holidays. Their virtualized infrastructure was more than capable of scaling horizontally to address the growing demand. However, their underlying, traditional database could not handle the large load increases. The database started to experience deadlocks, connection timeouts, and various other problems.

Continue reading