gemfire products scalability spring_integration

Adding Years to Your RDBMS by Scaling with Spring and NoSQL

Add Time To RDBMS with Spring Integration and GemFireRelational databases are becoming the legacy systems of today.

Of course, they are costly to replace completely. However, a memory-based, NoSQL cache can add miles and years to traditional RDBMS models like Microsoft SQL Server, Oracle, and even MySQL.

We often run into companies looking at a cache in the R area of CRUD—reads. This can be caused when data generated by one application is used across additional applications or analytics. Companies need a memory cache to provide a READ-only view of the transactional data as close to applications as possible while receiving data changes from the underlying RDBMS as they occur in real time.

In this post, we will walk through a technical solution where Microsoft SQL Server 2008 is extended with Pivotal GemFire and Spring Integration to scale reads and provide a high availability, READ-only view of data and support additional applications. Since every company has its own requirements and constraints, this approach may not be the best for every situation. The solution propagates CRUD events from MS SQL to GemFire and keeps them in sync instead of pursuing more traditional methods like data extraction.

Background on the Architecture and Components

In a nutshell, we start with a MS SQL Server database of record. Two MS SQL Server features help us send creates, updates, and deletes into a change tracking table. A .NET server broker and application on the database push the data into GemFire. Within GemFire, a Spring Integration flow processes the message into the cache.

If you aren’t familiar, Pivotal GemFire is a real time, NoSQL, in-memory, distributed data store that delivers speed and scale for a variety of cache and data store architectures. Alongside it’s NewSQL sister, Pivotal SQLFire, it provides high performance to applications that are data rich. These data grids are continuously available regardless of the number of end-users being served. Importantly, data can be shared between Java, C# and C++ clients easily.

As well, Spring Integration provides lightweight messaging within Spring-based applications and supports integration with external systems via declarative adapters. With support for Enterprise Integration Patterns, complex integration and ETL flows can be created using the already familiar Spring separation of concerns that is essential for producing maintainable, testable code.

Approaching SQL Server and Identifying Data Changes

Most enterprise level databases provide a mechanism to notify when changes have occurred on the data. When using SQL Server 2008 and above, we can take advantage of a two features known as “Query Notification” as well as “Change Tracking.” Using these two features, one can track data changes to individual rows within a table and query for the changes as they occur. Although this functionality is easily accessible, it is limited to Microsoft .NET clients only.

By tracking data changes to individual rows across all tables, the create, read, and delete changes are then stored in a change tracking table. Since the approach is built generically, it works to capture data across rows for all tables (not a subset of tables) and even works when underlying table changes occur.

Since the changed data can only be accessed via .NET clients, we use a .NET application—via a service broker on the SQL Server database server—to perform a few functions. First, each registered change for a given table in SQL Server is pushed to the C# client (service broker) application. This information includes:

• Change Tracking Number
• Table Name
• Operation Type
• Database Primary Key for changed entry

Next, the .NET application (the same C# service broker client) runs a separate query to retrieve the data for the row in question and pushes the data via the native GemFire client C# API:

• Change Tracking Number
• Table Name
• Operation Type
• Database Primary Key for changed entry
• Entry row data

While the service broker and .NET client are single points of failure, the solution was acceptable because the company knew the service could be restarted and pick up the missed changes. As well, data updates are not terribly frequent. So, changes don’t occur very often.

SQL_Server

Processing Data inside GemFire

Upon receiving the message, the Pivotal GemFire cluster dynamically starts processing the message from the “Command Region,” an abstraction of the command pattern. This region holds the message in memory for a Spring Integration flow to process. Since Pivotal GemFire is a Java-based application, the Spring Integration runs on the same JVM, not a separate application server.

The Spring Integration Adaptor for GemFire triggers the message processing as soon as it is received in the Pivotal GemFire cluster. The flow goes through four distinct steps in the processing of the received message.

Spring_Integration

  • Command Processor—This will process the received message. For this use-case, the Command Processor is very simple and, at this stage, can only process CRUD messages for data entries.
  • Domain Factory—This process will convert the message row data into a Domain object.
  • Persistence Processor—This process will handle all persistence of the Domain object. For an Insert /Update operation the Domain object will be persisted to memory. On a Delete operation the entry will be removed from memory. GemFire also has the capacity to support various disk persistence approaches, an unnecessary step for this architecture.
  • Message Tracking—This process will store the last successfully processed tracking number. This tracking number is then sent back to the .NET client, to ensure that it does not unnecessarily process and duplicate change notification messages.

Once the message has been completed, the resultant Pivotal GemFire cluster data changes will be available for all connected clients hitting the in memory cluster or even across a WAN with parallel, asynchronous replication.

While adding a cache like this to an existing database is a considerable decision and change tracking has it’s pros and cons, there is tremendous value in being able to freely add other applications and systems into the mix to listen for changes once the data is in GemFire. For many, recognizing the contrast between the scale of traditional RDBMS and the scale of GemFire presents a significant opportunity.

Further reading:

UKohlmeyer1 About the Author: Udo Kohlmeyer is a Senior Consultant for Pivotal. He has 12+ years in software engineering, with the majority of his time spent in low-latency, high-volume environments of the financial and telecoms sector. His interests lie in distributed, concurrent, low-latency, near real-time systems. Udo has a BSc (Hons) Computer Systems from the University of Pretoria.