Have you ever heard of a zettabyte? If you work in IT, you’ll be hearing more and more about zettabytes, exabytes, and petabytes while the data terms we think are big, such as terabytes and gigabytes wane away from our vocabulary. Right now, we are growing our data stores by 50% year-over-year, and its only accelerating.
While data volumes are skyrocketing, the type of data is also becoming more difficult for traditional databases to handle. Over 80% of it will be unstructured file based data that does not work well with block-based data storage typical of your typical relational databases (RDBMS). So, even if hardware innovations could keep up to support greater volume, the kinds of data we are now storing break traditional RDBMS at today’s speeds.
The bottom line is the volume and types of data being stored is unrealistic for a single, monolithic, structured RDBMS data store. They need to be broken apart and re-architected to survive the Information Explosion we are experiencing today.
In this article, we’ll talk about how to integrate the Lucene text searching solution using Spring Data and GemFire to provide a flexible, parallel fast search engine. By combining the two independent products we can leverage each product to its fullest capability. The end result provides an elastic search capability with the in memory data speeds of a distributed cache platform and high availability.
Motivation—Why Combine These?
The motivation of the project was to provide an alternative search capability for GemFire while providing users a natural method to define searchable domain object attributes. Performance was also a key driver to ensure constant search performance irrespective of scale. The solution outlined below provides a baseline approach for developers to build upon. Continue reading →
In our last post, we 1) covered how geographic data can release value in mobile and machine-based applications, 2) explained how technology is used to overcome barriers to these types of big data scenarios, and 3) detailed the architecture for a data fabric or grid (like vFabric GemFire) that works with geographic data and specialized or alternative indexes. There were also code examples to explain the object model, the spatial index, and data changes.
Now, we will continue the examples, show you how to make the index highly available, and use a function to access the data via the index.
The Scenario for a Highly Available Index
In some cases, a piece of data may be added to a node, or become primary on a node without a clean method call. This happens in the cases of both failover and rebalancing. In the case of failover, a bucket that is on a node (that was also a redundant copy) may suddenly become the primary copy if the node that held the primary failed.
In the case of rebalancing, an entire bucket can be moved to a new node that was added to the system without the benefit of capturing the “put” call on each piece of data. Continue reading →
Application developers and data management teams continue to look for ways to modernize legacy apps, manage costs more effectively, build new apps on robust application platforms, and solve big data problems. These are some of the key reasons why vFabric is on the CIO (or CTO) agenda. With several new product releases in the vFabric Suite, VMware continues to provide a best-in-class application platform and help customers solve their top application development and data management problems.
Mobile applications are one thing, but mobile apps WITH fast data requirements are another.
The combination of mobile apps and fast data requirements can cause major data scale issues. Whether you are trying to update an existing application or build a new application, mobile apps with personalization, pricing, location, or gaming functionality must consider data architecture differently from the outset.
An AT&T Senior EVP recently wrote, “Over the past five years, AT&T’s wireless data traffic has grown 20,000%. The growth is now primarily driven by smartphones.” In fact, many say that mobile use will cause a spectrum deficit in the U.S. According to the Telegraph, smartphones are mostly used for internet (24 minutes and 29 seconds per day) and social media (17 minutes and 29 seconds per day) while phone calls are ranked 5th (12 minutes and 6 seconds per day). Similarly, mobile commerce is planned to rise from 1% of all e-commerce sales in 2010 to 7% in 2016 (i.e. from $3 billion to $31 billion in a 6 years period). Apps are also accounting for more minutes of usage. So, no wonder business groups are clamoring for mobile-centric programs and applications.
The bottom line is that mobile applications are growing data differently than traditional database applications.
Application and operations teams sometimes reach a point where they must upgrade the database. Whether it’s due to data growth, lack of throughput, too much downtime, the need to share data globally, adding ETLs, or otherwise, it’s never a small project. Since these projects are expensive, any recommendation requires a solid justification. This article a) characterizes 3 signs where traditional databases hit a wall, b) explains how vFabric SQLFire provides an advantage over traditional databases in each case, and c) should help you make a case for moving towards an in-memory, distributed data grid based on SQL.
For those of us tasked with upgrading (or architecting) the data layer, we all go through similar steps. We build a project plan, make projections and sizing estimates, perform architecture and code reviews, create configuration checklists, provide hardware budgets and plans, talk to vendors about options, and more. Then, we work to plan the deployment with the least downtime, procure hardware and software, test different data load times, evaluate project risks, develop back-up plans, prepare communications to users about downtime, etc. You know the drill. These projects can take months and consume a fair amount of internal resources or consulting dollars. If you are starting or working on one of these types of projects with a traditional database architecture in mind, are you considering these 3 signs as you consider your options? Continue reading →
As we’ve previously covered, data growth is quite unbelievable, and this means traditional database models are being stretched. On Tuesday, November 13, 2012 at 9:00 AM PST, VMware’s Joe Russell will be presenting on several topics related to Big, Fast, Flexible Data and how VMWare’s key data management technologies help companies overcome some of the key challenges with traditional RDBMS.
Attend to learn:
How Hadoop and new analytics technologies are allowing companies to use Big Data in new ways to gain meaningful business insights
What’s new with Project Serengeti, a VMware initiative to help you deploy and manage elastic Hadoop clusters in minutes
How Fast Data is bringing data logic in-memory, allowing for dramatic scale, reduced costs, and improved performance
How Flexibile Data, includng NoSQL and open source relational data technologies can improve your data model
How virtualizing the database layer enables a new Cloud Delivery Model, allowing enterprise IT departments to offer self-service data services elastically on demand, maintain centralized control, and operate within regulatory guidelines
How do you plan a roadmap for moving from a legacy data architecture to a cloud-enabled data grid? In this article, we will offer a pragmatic, three-stage approach. At SpringOne-2012, the “Effective design patterns with NewSQL” session (see presentation embedded below) generated a lot of interest. (Thank you to everyone who joined us!) Jags Ramnarayan and I discussed problems with legacy RDBMS systems, NewSQL driving principles, SQLFire architecture, application design patterns as well as data consistency and reliability.
We went deep into vFabric SQLFire which is a pragmatic solution that addresses these data challenges:
How do I architect my data tier for very high concurrent workloads?
How do I achieve predictability both for data access response time and availability?
How do I distribute data efficiently and real time to multiple data centers (and to external clouds)?
How do I process these large quantities of data in an efficient manner to allow for better real-time decision-making?
If you haven’t heard, the SpringTrader reference architecture is used to help Java-based application architects, developers, infrastructure, and operations teams advance their application roadmaps and provide reusable patterns. Some might also consider how vFabric Application Director can be used with the SpringTrader app to enable continuous deployment or automatically provision and scale the app in a completely virtual data center (i.e. a software defined data center). As well, vFabric Application Performance Manager can be used to monitor the entire stack and trigger automated scaling events like adding a new JVM and tc Server to the SpringTrader app’s production environment. Continue reading →