How do you plan a roadmap for moving from a legacy data architecture to a cloud-enabled data grid? In this article, we will offer a pragmatic, three-stage approach. At SpringOne-2012, the "Effective design patterns with NewSQL" session (see presentation embedded below) generated a lot of interest. (Thank you to everyone who joined us!) Jags Ramnarayan and I discussed problems with legacy RDBMS systems, NewSQL driving principles, SQLFire architecture, application design patterns as well as data consistency and reliability.
We went deep into vFabric SQLFire which is a pragmatic solution that addresses these data challenges:
- How do I architect my data tier for very high concurrent workloads?
- How do I achieve predictability both for data access response time and availability?
- How do I distribute data efficiently and real time to multiple data centers (and to external clouds)?
- How do I process these large quantities of data in an efficient manner to allow for better real-time decision-making?
At VMware, we understand that adopting a new technology can be a daunting task. With SQLFire, one can take an evolutionary approach. You can start with 1) an embedded or distributed cache, 2) move to or add a full-fledged OLTP data store, and 3) then enable a global, cloud database or distributed compute grid (even for map-reduce style computing)… at your own pace. This approach is both popular with customers and based on quite a few years of experience working in this space with hundreds of mission-critical systems and their respective companies.
STEP 1: ADD AN EMBEDDED OR DISTRIBUTED CACHE
As an Embedded, Clustered Java Database
To get started, you can embed SQLFire into Java applications by including the required SQLFire libraries. When the application initiates a connection to SQLFire, it starts a peer server that joins other peers in the same cluster. Unlike other embedded databases such as H2 or Derby, SQLFire allows several servers to store replicated and partitioned tables, persist data to disk, communicate directly with other servers and participate in distributed queries. In this diagram, we show how the SQLFire database is embedded in a tcServer application server and is still part of the overall data fabric.
With Write-thru Distributed Caching
When you need to boost application performance and reduce network traffic, you can use SQLFire as a distributed cache to offload your existing DBs. Applications can use a familiar SQL syntax to read from and write to the distributed cache.
As depicted in the diagram below, whenever your application looks for a record that is not in the cache, SQLFire will use a RowLoader which is the callback component associated with the table you are querying. Then, the RowLoader transparently reads the record from the external database using its primary key, updates the cache and returns that record to your application.
With the write-through pattern, all SQLFire changes can be synchronously written to the external database before the cache is changed via a Database Writer. Because SQLFire can participate in the container transaction, if the write-through succeeds then the data becomes available in the cache. In this diagram, an InsertWriter is registered on the Flights table, an ‘Insert flight 747’ will trigger the callback event handler of the InsertWriter to synchronize with the external database first and then insert into SQLFire Flights data partition.
Distributed Caching with Asynchronous Writes to DB
If synchronous writes to the backend are too costly, your application can asynchronously write to the external database using a cache listener. For high availability purposes, you can designate a primary listener with one or more stand-by instances to ensure that the solution continues to run even under failure conditions. Whenever the external database is not available, your application will continue to run as SQLFire queues up events until the external database comes back up. Once that happens, the listener will forward data changes to keep the external databases up to date. In this diagram, we show SQLFire persistent, redundant queues with the primary listener writing in asynchronous batches to a legacy database.
Step 2: Full-Fledged, Scalable OLTP Data Store (or In-Memory Data Grid)
Because of SQLFire’s performance, availability, and reliability guarantees, you may no longer need legacy databases behind SQLFire. Given the fact that SQLFire relies on standard SQL, JDBC, and ADO.NET, applications designed for traditional databases can easily migrate to SQLFire. This means you can add a sophisticated distributed data platform to applications whose requirements are thousands of transactions per second, sub-millisecond response time, and linear scalability. This pattern can greatly simplify your data architecture and reduce maintenance costs as well as operational overhead.
In this diagram, we show colored boxes as representing replicated or partitioned data across SQLFire servers. SQLFire’s shared-nothing architecture is designed to prevent any single point of failure as it maintains data availability and consistency even when servers go offline unexpectedly.
Step 3: Global, Cloud Database or Distributed Compute Grid
SQLFire as Global, Cloud Database
SQLFire supports global WAN connectivity and gives you the option of replicating data across data centers and cloud providers. This multi-site topology makes geographically-distributed clusters appear as one global system. If one of the SQLFire clusters fails or is taken offline for maintenance, clients in that geographical region can failover to the next available cluster and continue to operate.
This makes SQLFire the ideal solution for disaster recovery and business continuity purposes.
The figure below shows three globally distributed sites (New York, Tokyo, and London), each site hosting a SQLFire cluster. Within each site, a gateway is configured to provide data distribution between sites in case of a failure event, or for consistent global views of data around the world.
SQLFire as Distributed Compute Grid (Real-time, in-database Map-Reduce)
Using SQLFire stored procedures, you can implement business logic at the server level that runs on the same process space as your data and in parallel on multiple SQLFire servers, thus significantly improving application performance and scalability. The capability here is basically a real-time, in-database map-reduce.
Because the SQLFire clusters consist of multiple servers, the stored procedure execution in SQLFire is parallelized to run on multiple servers, concurrently. SQLFire makes the whole map-reduce process in real-time and transparent to your application. As well, the ability to write functions and procedures in Java brings the complete set of Java APIs and the power of Spring into your SQL environment as server side logic. There is a tremendous amount of power, speed, and scale in this architecture pattern as we have seen in Hadoop examples. In this diagram, we show how an analysis job can be executed in parallel across SQLFire servers as Java stored procedures.
One thing is clear. If data is locked up in heavyweight and legacy RDBMSs, organizations can't realize the benefits promised by cloud computing.
Now you have a very good idea of how to incorporate SQLFire into your architecture at your own pace to continue your journey towards cloud computing. Go ahead and test-drive SQLFire. It will be worth your time.
Thank you for reading!
> > For more on SQLFire:
Spring One 2012 Presentation - Effective design patterns with NewSQL
|About the Author: Guillermo is an award-winning Enterprise Architect with 17+ years of progressive experience in different industries. As a Regional Senior Systems Engineer for VMware's Cloud Application Platform division, Guillermo works with customers to understand their business needs and challenges and helps them seize new opportunities by leveraging vFabric to modernize their IT architecture. Guillermo is passionate about his family, business, technology and soccer.|