Home > Blogs > VMware vFabric Blog


Why Every Database Must Be Broken Soon

Have you ever heard of a zettabyte? If you work in IT, you’ll be hearing more and more about zettabytes, exabytes, and petabytes while the data terms we think are big, such as terabytes and gigabytes wane away from our vocabulary. Right now, we are growing our data stores by 50% year-over-year, and its only accelerating.

In 2010, we crossed the barrier of the zettabyte (ZB) across all online data. This year, we will produce 4 ZB of data worldwide. In 2016, global IP traffic will reach 1.3 ZB.

While data volumes are skyrocketing, the type of data is also becoming more difficult for traditional databases to handle. Over 80% of it will be unstructured file based data that does not work well with block-based data storage typical of your typical relational databases (RDBMS).  So, even if hardware innovations could keep up to support greater volume, the kinds of data we are now storing break traditional RDBMS at today’s speeds.

The bottom line is the volume and types of data being stored is unrealistic for a single, monolithic, structured RDBMS data store. They need to be broken apart and re-architected to survive the Information Explosion we are experiencing today.

Why Traditional Databases Are Already Broken

To understand why traditional data stores will not scale in this explosive data era, its useful to remind ourselves of how they work. Essentially, an RDBMS sorts structured data in neat rows and columns. To retrieve the data, we relied on Structured Query Language (SQL) that specifically describes the data required for the next transaction or report and also explicitly named the database location. While data could be federated across different application sub-systems, such as a customer database and a provisioning system, architects usually tried to organize data that would be used together in a single store to increase performance and availability of data to end users.

This data exists on disk, and works best when data is structured and easily categorized into those neat rows and columns. That world is gone.

The days when we had the time to architect perfect database schemas, so all data could be organized into those neat rows and columns is past. In the new world, the volume, velocity and unpredictability of data coming from modern sources over-whelm any pre-designed schema almost instantly. As uses of data change, there isn’t time to design new structured schemas, upgrade data or endure any downtime.

Additionally, with the volume of data we are collecting and using everyday, we are hitting the upper limits of machine storage and performance tradeoffs for monolithic databases. Even if the database solutions could perform competitively in this area, it would likely be cost prohibitive to have single large data stores.

The NoSQL Revolution

While many companies are still learning of new data storage approaches for big, fast data, IT companies and open source innovators have been working on a few approaches over the past few years. It started with NoSQL distributed database models. This strategy worked by eliminating much of the structured data in the query and reduced queries to straight key-value pairs. It also supports partitioning of data across several machines as well as replication. These are all good things, as they help us to scale data horizontally across many commodity servers versus a single expensive disk array. It also greatly sped up queries in general. But, this solution still relies on data being stored on disk, and replication takes time to do all the reads and writes. Also, since there could be a delay in replication, result sets could be out of date, so financial transactions and other types of transactions that need reliability are not good candidates for this solution. So, while it worked, performance was inconsistent and it did not deal well with data replication across large distances.

In short, we could do better.

The Advent of NewSQL

In looking at the performance gains of the distributed systems and the limitations of data consistency that comes with simple data caching and a lack of schema, some companies embraced a middle ground of sorts. NewSQL, as the name suggests, keeps SQL as a way to access structured data, but also introduces distributed data structures that help with performance. Because NewSQL still uses SQL, designing database queries can be simpler, reducing complexity for developers to a degree, but still using distributed systems to improve performance. As with NoSQL, concepts like data partitioning and replication while maintaining data consistency are brought to bear.

This solution is a great compromise for many organizations. The structured schemas mixed with distributed data helps applications looking to do transaction based processing very well and at scale. Where it can fall short is analytical processing because the schema is not necessarily set up for the desired queries, and the way the databases handles the data distribution makes it so these NewSQL databases do not scale linearly.

In-Memory Data Stores Are Faster, More Consistent

We know that the NoSQL data architectures worked from a few standpoints: they handled unstructured data well and they partitioned data so we could scale horizontally with cheap commodity servers. But as humans, we always hunger to do better. We wanted this solution to be faster, more consistent and scale even better. The challenges with NoSQL can be blamed on its usage of disk. So why not eliminate it?

Over the past few years, memory has gotten cheap and is easily commoditized in the cloud. So moving your data strategy to put it all in-memory just plain makes sense. It eliminates an extra hop to read and write data from disk, making it inherently faster and the performance more consistent. It also manages to simplify the internal optimization algorithms and reduce the number of instructions to the CPU making better use of the hardware.

It also supports both NoSQL and NewSQL style solutions, affording both of them better performance and allowing developers the freedom to choose the right modern solution for their data needs.

Breaking Databases Apart

Big, fast data is coming. To survive this data revolution, smart businesses need to ensure their data strategies are prepared to harness the power of their data and maintain complete histories of their data. They’ll need to partition their data across multiple data stores, and relearn how to architect, query and store data.

For more resources on how to employ these new data patterns and the VMware products that will help you in this new information age:

This entry was posted in GemFire, SQLFire and tagged , , , , , , on by .
Stacey Schneider

About Stacey Schneider

Stacey Schneider has over 15 years of working with technology, with a focus on working with sales and marketing automation as well as internationalization. Schneider has held roles in services, engineering, products and was the former head of marketing and community for Hyperic before it was acquired by SpringSource and VMware. She is now working as a product marketing manager across the vFabric products at VMware, including supporting Hyperic. Prior to Hyperic, Schneider held various positions at CRM software pioneer Siebel Systems, including Group Director of Technology Product Marketing, a role for which her contributions awarded her a patent. Schneider received her BS in Economics with a focus in International Business from the Pennsylvania State University.

9 thoughts on “Why Every Database Must Be Broken Soon

  1. Pingback: New Serengeti Release Extends Cloud Computing Support for Hadoop Community | VMware vFabric Blog - VMware Blogs

  2. Pingback: Disaster Recovery Jackpot: Active/Active WAN-based Replication in GemFire vs Oracle and MySQL | VMware vFabric Blog - VMware Blogs

  3. Pingback: Understanding Speed and Scale Strategies for Big Data Grids and In-Memory Colocation | VMware vFabric Blog - VMware Blogs

  4. Pingback: How Instagram Feeds Work: Celery and RabbitMQ | VMware vFabric Blog - VMware Blogs

  5. Pingback: VMware vFabric Blog: Cloudera Gets More Cloudy: Partners and Certifies CDH4 on vSphere | System Knowledge Base

  6. angielski Szczecin

    Thanx for a very informative blog. Where else could I get that type of information written in such a perfect method? I have a project that I am simply now working on, and I’ve been on the look out for such info.

    Reply
  7. Pingback: QingPHP博客 » 国内外三个不同领域巨头分享的Redis实战经验及使用场景

  8. asos discount promo codes american eagle

    While Black Friday midnight shopping does not appeal to all you fashionistas, don’t sweat the loss of a bargain or
    two. However, you wouldn’t (hopefully) walk into work tomorrow wearing a replica of Madonna’s “Like a Virgin” skirt.
    The shops don’t always such as in order to promote their coupons on their own
    web pages, when you are absolutely right now
    there compared to you are most likely going to be able to
    buy something, so that they would certainly wish we too spend because a great deal funds because potential.

    Reply
  9. Pingback: Is there a sequel to NoSQL #bigdata analytics? SciFi has already foreshadowed it | Technorati

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>