Home > Blogs > VMware vFabric Blog

4 Key Architecture Considerations for Big Data Analytics

Everyone seems to be talking about “Big Data” these days. We’re bombarded with information in online and print media about the explosion of machine generated data, the petabytes of data that companies like Facebook and Twitter generate, and the billions of dollars of opportunities awaiting all businesses through the use of big data. We also hear about what seems an alphabet soup of new technologies to process and analyze big data: Hadoop for distributed data processing, R for analytics, Lucene for text indexing and search, Mahout for machine learning…the list goes on and on.

If you’re a business user, you’re thinking that big data could give you an edge over your competition. If you’re a developer, you’re excited about the many new technologies you can learn about. If you’re an architect, you’re trying to figure out how all these big data technologies fit within your existing and future infrastructure.

Architecture Questions on Big Data and Big Analytics

As we are all asked to design approaches with both technical merit and cost appropriateness, the traditional questions come up, and a significant amount of internal and external research begins:

  • What are the business goals and drivers?
  • What architectural principles and governance will be followed?
  • What is the scope?
  • What do current and future state systems look like?
  • What are the performance needs?
  • Where are the greatest cost implications?

In the Big Data arena, key architecture fundamentals change, and we hope the framework below provides two things for you:

  1. A simple model to start thinking through cloud architectures for Big Data and Big Analytics at your company.
  2. A prompt to identify what you might learn from our “Big Data and Big Analytics” Panel at VMworld [link].

Beginning with a Simple Framework

At VMware, we’ve been using a simple framework to look at the key components of a Big Data system and help our customers work through many architectural decisions as they explore the world of big data. Big data often brings four newer and very different considerations in an enterprise architecture:

Going to VMworld in Barcelona?

Register for VMworld!
Click Here

Register for Session APP-CAP1250 – Fast Data Meets Big Data:
Click Here

Follow all vFabric updates at VMworld on Twitter:
Click Here

  1. Data sources have a different scale – while the most obvious, many companies work in the multi-terabyte and even petabyte arena.
  2. Speed is critical – nightly ETL (extract-transform-load) batches are insufficient and real-time streaming from solutions like s4 and Storm are required.
  3. Storage models are changing – solutions like HDFS (Hadoop Distributed File System) and unstructured data stores like Amazon S3 provide new options.
  4. Multiple analytics paradigms and compute methods must be supported:
    • Real-time database and analytics: These are typically in-memory, scale-out engines that provide low-latency, cross-data center access to data, and enable distributed processing and event-generation capabilities.
    • Interactive analytics: Includes distributed MPP (massively parallel processing) data warehouses with embedded analytics, which enable business users to do interactive querying and visualization of big data.
    • Batch processing: Hadoop as a distributed processing engine that can analyze very large amounts of data and apply algorithms that range from the simple (e.g. aggregation) to the complex (e.g. machine learning).

The diagram below illustrates this framework and shows that some components, or potentially the entire big data system, can run on a cloud infrastructure, which can make the system elastic, highly available, and multi-tenant. With resource sharing, we can ultimately bring the benefits of cloud computing to big data and keep budgets in check.

A Panel focused on Big Data Architecture Approaches

Since everything around big data is evolving very fast, there are many different perspectives on architecture, technologies, and products. We hope to bring several of these perspectives to you at VMworld San Francisco. I will be moderating a Big Data and Big Analytics panel discussion with an incredible group of industry visionaries and practitioners, who are ready to share their insights and respond to questions from the audience. Here are some examples of the types of questions the panelists will address:

  • What are the best examples of big data applications that you’ve seen?
  • What are the best practices for big data systems architecture?
  • What is the role of virtualization and cloud computing in big data?
  • How do you see the world of big data evolving over the next few years?

Collectively, this group has several decades of experience building, using and managing Big Data and Big Analytics projects. Let me introduce the panel participants:

  • Amr Awadallah, CTO and Co-Founder at Cloudera, will talk about the impact Big Data and Hadoop are having on the industry, drawing upon his extensive experience at Cloudera as well as previously leading organizations like VivaSmart and Yahoo!.
  • Zubin Dowlaty, VP and Head of Innovation & Development at Mu Sigma, will share customer scenarios and analytical approaches from his 18 years of experience applying quantitative methods from corporate data assets.
  • Jim Kaskade, EIR at PARC (a Xerox company) will talk about the breakthrough ideas in Big Data and Customer Experience Management coming out of the research groups at PARC, where they use technologies like Hadoop, NoSQL databases, complex event processing, R and many more.
  • Richard McDougall, CTO of Application Infrastructure at VMware, will be ready to discuss architectural approaches for Big Data and Big Analytics solutions on a virtualized platform, drawing on his deep experience with scalability, availability and performance of distributed systems.
  • Stephen O’Sullivan, Senior Director at Walmart Labs, will draw upon his 20+ years of experience creating enterprise applications and data management solutions and his leadership experience at companies at the bleeding edge of technology like Walmart, LiveOps, Yahoo! And Sun

If this sounds like the kind of session you will enjoy, we would love to see you at
APP-CAP 1963 (Big Data and Big Analytics Panel).

  About the Author: Fausto Ibarra is Sr. Director of Product Management at VMware. He leads VMware’s product management efforts for data management, big data and analytics. Prior to VMware, Fausto had various product management and engineering leadership roles at Microsoft, in the database, business intelligence and application platform businesses. He has extensive prior experience in product management, marketing and IT strategy at BEA Systems, TIBCO Software and McKinsey & Company. Fausto holds a B.S. in Computer Science from the Instituto Tecnologico de Monterrey in Mexico, an MBA from the University of Pennsylvania Wharton School of Business, and an MA in International Studies from the University of Pennsylvania.

19 thoughts on “4 Key Architecture Considerations for Big Data Analytics

  1. Pingback: Virtual Intelligence Briefing » 4 Key Architecture Considerations for Big Data Analytics

  2. universal life insurance

    i love your blog and i think its going to my favorite!!i love viagra and cialis too!

  3. Pingback: twitter analytics architecture - Search Yours

  4. طراحی سایت

    This is a very good . Thank you very much!

  5. Blank Calendar 2016

    This is the perfect web site for anybody who really
    wants to understand this topic. You realize so
    much its almost hard to argue with you (not that I really would want to…HaHa).
    You certainly put a brand new spin on a topic that has
    been written about for a long time. Wonderful stuff,
    just wonderful!

  6. خرید vpn


  7. کولرگازی

    this is very Good .Thanks

  8. قانون جذب

    Let’s be honest, I’m just here to create a link to my website.

  9. دانلود اسکریپت

    this is very Good . very Thanks

  10. google

    Today, while I was at work, my cousin stole my iphone and tested
    to see if it can survive a 25 foot drop, just so she can be a youtube sensation.
    My iPad is now destroyed and she has 83 views. I know this is totally
    off topic but I had to share it with someone!

  11. کسب درآمد

    Thanks a lot!

  12. خبر اقتصادی

    Good job at framework part!

  13. ثبت شرکت در گرجستان

    best web


  14. ثبت شرکت در گرجستان

    tanks post
    best web site

  15. مودم

    bes modem

  16. لوازم خانگی

    besr site

  17. babies r us credit card login

    We believe that the knowledge about the courses usage types is important because this may improve the strategies of training and information dissemination about the resources available in Moodle. However, gathering the statistics of courses usage is not a trivial task, mainly when this is performed in a manual way; the last time we did that, it took us 16 hours of our precious time ;).

    Hence, we developed “Courses Usage Statistics”; this is a Moodle report plugin that helps the admin to known how the courses are being used by users (e.g. as forum, as file repositories or as activities repositories). Feel free to use and contribute to this project by improving the plugin functionality or letting us to know about possible bugs existing in its code.

  18. کتاب وب

    very nice

  19. کتاب ملت عشق

    your article helped me.


Leave a Reply

Your email address will not be published. Required fields are marked *