One of the most popular topics at this year’s VMworld is Big Data Analytics. We had an opportunity to catch up with VMware’s Senior Director of Big Data Analytics, Karthik Kannan to ask a few questions about this topic.
1. Why is big data analytics different than traditional analytics?
For one, dimensions are much larger. For example, when you look at data from multiple sources, there are additional ways to both combine the data and filter it. This isn’t just volume. Data comes from a wider set of sources like devices. Data is created at faster speeds like a terabyte per day. Data can change rapidly like in financial markets or on social networks. Traditional analytics are more about regular-interval reports on data that doesn’t change much. For example, “Weekly Accounts Receivable” is much more static in terms of the structure, schema, sources, and the data itself.
2. Why is real-time analytics different than traditional analytics?
Time makes things different. If you are processing nightly analytics for “daily sales volume” reports, the time factor isn’t an issue. But, if you need to have an alert that notifies you in real time and is based on an analytical process across a large volume of data, it’s a completely different requirement. You have to do a very large amount of processing all the time.
3. Could you give one example of where real-time is such an important factor in a big data scenario?
Sure, you see typical use cases at online companies for ecommerce, gaming, mobile, and advertising. In advertising, there are real-time decisions surrounding what type of ad to place in front of a certain type of visitor or user. The amount of data to churn through is significant, and the decision is evaluated quite regularly.
4. What is a good example of where the power of Hadoop can come into play?
The power of Hadoop is that it can process a massive amount of information very quickly because it is built for parallel, distributed computing. So, it can just deal with a much larger volume of information in a much faster way. The classic scenario is crawling web pages to organize search results, but providing that as an example doesn’t really express the power. Think of a web page as a database record with links to other pages (i.e. database records) – now think about the amount of application power it would take to regularly look at 100s of millions or billions of these random database records, process them for meaning, process them for relationship, process them into groups and categories, process them into rankings, and process them into different formats. Now, do that on a regular basis to keep the information current.
5. What is one capability you discuss in your VMworld session that always gets people’s attention?
When we explain that our platform gives end-users the power to do custom analytics based on Hadoop while masking (or hiding) the complexities of this underlying technology from the end-user, most people want to know more. You can learn more at about this if you look for information about Cetas at VMworld.