The new database is opening up significant career opportunities for data modelers, admins, architects, and data scientists. In parallel, it’s transforming how businesses use data. It’s also making the traditional RDBMS look like a T-REX.
Our web-centric, social media, and internet-of-things are acting as a sea-change to break traditional data design and management approaches. Data is coming in at increasing speeds, and 80% of it cannot be easily organized into the neat little rows and columns associated with the traditional RDBMS.
Have you ever heard of a zettabyte? If you work in IT, you’ll be hearing more and more about zettabytes, exabytes, and petabytes while the data terms we think are big, such as terabytes and gigabytes wane away from our vocabulary. Right now, we are growing our data stores by 50% year-over-year, and its only accelerating.
While data volumes are skyrocketing, the type of data is also becoming more difficult for traditional databases to handle. Over 80% of it will be unstructured file based data that does not work well with block-based data storage typical of your typical relational databases (RDBMS). So, even if hardware innovations could keep up to support greater volume, the kinds of data we are now storing break traditional RDBMS at today’s speeds.
The bottom line is the volume and types of data being stored is unrealistic for a single, monolithic, structured RDBMS data store. They need to be broken apart and re-architected to survive the Information Explosion we are experiencing today.
Day 2 of the O’Reilly Strata Conference is starting here in Santa Clara, California and the focus is very much on data. In 2005, Tim O’Reilly predicted: “Data is the Next Intel Inside.” At VMware, big, fast data has never been so critical for our customers and innovations are transforming the cloud applications landscape at an unprecedented rate. This conference comes at the perfect time to reset what everyone knows about big, fast data.
The conference kicked off yesterday with several brief 20 minute keynotes. They were all succinct and to the point. Greenplum‘s Scott Yara reflected on how the big data market has grown tremendously over the past few years and mentioned several key data scientist practitioners. Scott also mentioned the increased investment in open source Hadoop. Of course, Strata comes on the heels of the Greenplum Pivotal HD announcement on Monday which launched their distribution of Hadoop which can improve performance 50X to 500X when compared to existing SQL-like services on top of Hadoop.
Another great keynote presentation was from Yael Garten, a Senior Data Scientist from LinkedIn. Yael leads the mobile data analytics team. She began by polling the audience and noting that many in the audience had already been on 3 different devices that morning and it wasn’t even 9:30 am yet. She noted we’re constantly connected, and we need to use data to personalize the experience for users no matter what device we’re on. She had an interesting graph highlighting device use and laptop use during our morning time of “coffee to couch”. And those uses are different in the US compared to places like India. Continue reading →
Apache Derby is used for its RDBMS components, JDBC driver, query engine, and network server.
The partitioning technology of GemFire is used to implement horizontal partitioning features of vFabric SQLFire.
vFabric SQLFire specifically enhances the Apache Derby components, such as the query engine, the SQL interface, data persistence, and data eviction, as well as adding additional components like SQL commands, stored procedures, system tables, functions, persistence disk stores, listeners, and locators, to operate a highly distributed and fault tolerant data management cluster.
If you don’t know about Spring Insight Developer, this post may save you tons of time and potentially headache.
Imagine that you need to update some code behind a button, but you didn’t write the code. What if you could press the to-be-coded button and then see what code was invoked (including methods and arguments), the SQL invoked, and the time it took to execute?
This is what Spring Insight Developer allows you to do, and more.
It’s also free, and it uses AspectJ and AOP to load-time weave your application, you do not have to make any changes to your application code to use it.
Let’s take a look at a simple example of tracing your app, viewing the details, and seeing the code in action.
This week we are excited to have a guest post on vFabric RabbitMQ from Mike Hadlow, enterprise Microsoft.NET developer and architect with 15below.com. Mike covers:
Their Architecture Before RabbitMQ
Why they went with RabbitMQ
Their Infrastructure and Development Environment
How RabbitMQ fits in their Software Architecture
In this post, I want to share our experiences of using RabbitMQ at 15below.
15below is based in Brighton, UK. We provide messaging and integration services for the travel industry. Our clients include Ryan Air, Qantas, JetBlue, Thomas Cook, and around 30 other airline and rail customers. 15below sends hundreds of millions of transactional notifications every year to its customer’s passengers over a wide range of channels including email, SMS, push, and voice.
RabbitMQ has helped us to significantly simplify and stabilise our software. It’s one of those solutions that you install, configure, and then really don’t have to worry about. In over a year of production, we’ve found it to be extremely stable without a single production issue. Continue reading →
Big, fast data is powering some of the most interesting computing opportunities in today’s market. But in order to get there, we need to change our approach to the data tier. Enterprises are trying to move from costly mainframe architectures to virtualized datacenters and utilize commodity hardware more efficiently. With the data tier, this means an architecture that scales horizontally by adding more commodity-based computing and storage at runtime.
To scale the data tier horizontally, companies use systems like vFabric GemFire, a distributed data system that is designed to specifically accommodate large data sets across commodity hardware nodes. In GemFire, data is spread across members of a cluster with members referred to as “nodes,” and the distribution of data across those nodes is called “partitioning.” vFabric GemFire then allows developers to query the data that resides across many nodes while retaining core values of very high performance at scale. How? In short, the answer is “Data Aware Querying” – a query API that allows a query to execute on selective nodes instead of all nodes (i.e. execute in a map-reduce style).
Application and operations teams sometimes reach a point where they must upgrade the database. Whether it’s due to data growth, lack of throughput, too much downtime, the need to share data globally, adding ETLs, or otherwise, it’s never a small project. Since these projects are expensive, any recommendation requires a solid justification. This article a) characterizes 3 signs where traditional databases hit a wall, b) explains how vFabric SQLFire provides an advantage over traditional databases in each case, and c) should help you make a case for moving towards an in-memory, distributed data grid based on SQL.
For those of us tasked with upgrading (or architecting) the data layer, we all go through similar steps. We build a project plan, make projections and sizing estimates, perform architecture and code reviews, create configuration checklists, provide hardware budgets and plans, talk to vendors about options, and more. Then, we work to plan the deployment with the least downtime, procure hardware and software, test different data load times, evaluate project risks, develop back-up plans, prepare communications to users about downtime, etc. You know the drill. These projects can take months and consume a fair amount of internal resources or consulting dollars. If you are starting or working on one of these types of projects with a traditional database architecture in mind, are you considering these 3 signs as you consider your options? Continue reading →
How do you plan a roadmap for moving from a legacy data architecture to a cloud-enabled data grid? In this article, we will offer a pragmatic, three-stage approach. At SpringOne-2012, the “Effective design patterns with NewSQL” session (see presentation embedded below) generated a lot of interest. (Thank you to everyone who joined us!) Jags Ramnarayan and I discussed problems with legacy RDBMS systems, NewSQL driving principles, SQLFire architecture, application design patterns as well as data consistency and reliability.
We went deep into vFabric SQLFire which is a pragmatic solution that addresses these data challenges:
How do I architect my data tier for very high concurrent workloads?
How do I achieve predictability both for data access response time and availability?
How do I distribute data efficiently and real time to multiple data centers (and to external clouds)?
How do I process these large quantities of data in an efficient manner to allow for better real-time decision-making?
Virtualization continues to be one of the top priorities for CIOs. As the share of virtualized workloads approaches 60%, the enterprise is looking at database and big data workloads as the next target. Their goal is to realize the virtualization benefits with the plethora of relational database sprawling in their data centers. With the increasing popularity of analytic workloads on Hadoop, virtualization presents a fast and efficient way to get started with existing infrastructure, and scale the data dynamically as needed.
VMware’s vFabric Data Director 2.5 now extends the benefits of virtualization to both traditional relational databases like Oracle, SQL Server and Postgres as well as Big Data, multi-node data solutions like Hadoop. SQL Server and Oracle represent the majority of databases in enterprises, and, Hadoop is the one of the fastest growing data technologies in the enterprise.
vFabric Data Director enables the most common databases found in the enterprise to be delivered as a service with the agility of public cloud and enterprise-grade security and control.
The key new features in vFabric Data Director 2.5 are:
Support for SQL Server – Currently supported versions of SQL Server are 2008 R2 and 2012.
Support for Apache Hadoop 1.0-based distributions: Apache Hadoop 1.0, Cloudera CDH3, Greenplum HD 1.1, 1.2 and Hortonworks HDP-1. Data Director leverages VMware’s open source Project Serengeti to deliver this capability.
Streamlined Data Director Setup – Complete setup in in less than an hour
One-click template creation for Oracle and SQL Server through ISO based database and OS installation
Oracle database ingestion enhancements – Now includes Point In Time Refresh (PITR)
Data Director’s self-provisioning enables a whole new level of operational efficiencies that greatly accelerates application development. With this new release, Data Director now delivers these efficiencies in a heterogeneous database environment.