According to one of our partners, vFabric SQLFire is a product he wishes more customers would use.
“SQLFire is a game-changer. I think many companies underestimate the value of scaling the data later horizontally. Every project I propose has a business case, and I see a tremendous amount of value being unlocked with this product—not just for the CIO or CTO’s agenda, but for the CFO and CEO. Then, you add the fact that the whole application stack is virtualized and has solid integrations. It’s a simple story, the product allows you to add a lot of value in a really cost effective way.”
What makes SQLFire such a game-changer?
In this article, we’ll talk more about three game-changing capabilities: server groups, partitioning, and redundancy.
If you haven't been following our stories on SQLFire, see the end of this article for a list of posts and key capabilities that help explain how transformative SQLFire can be to your data management strategies.
SQLFire Topology Review
The main three topologies that are available as background for this article:
- Client/Server. This is the traditional “client/server” database model, but with the servers forming distributed data grid. Each SQLFire member (node) runs it’s own JVM to provide network distribution, in-memory storage, disk persistence management, and more. These nodes are presented to the client as a single data management cluster (i.e. clients don’t participate in distributed queries).
- Embedded Peer-to-Peer. In this model, Java application servers and SQLFire peers share the same process heap for things like session state, web content, and back office data.
- Global, Multisite. You can configure a multisite topology that spans data centers (even with across long distances) for active-active disaster recovery.
SQLFire Server Groups
Server Groups truly begin to enable the concept of a data grid or fabric. With server groups, you can specify which grid members (i.e. nodes) host data for a given table, and members can participate in one or more server groups. There are several very cool things about this:
- You could put different logical database schema in different server groups—you could put CRM in one group, web profile data in another, orders in a third group, inventory in a fourth group, and social profiles in a fifth group. Even with multiple databases on the grid, developers can still look at a server group as a single, logical database server but gain access to much more data.
- SQLFire can colocate related data on to a single node (for performance reasons) and dynamically change membership, add/remove capacity, rebalance tables, or run stored procedures in a parallel fashion across these data sets. So, if you wanted to, you could do things like dynamically a) put a set of CRM, web, order, inventory, and social data on a node, b) increase capacity temporarily, c) run a map-reduce style job, and d) make the resulting data-set available to consuming applications at high scale.
- Server groups can be aligned to certain business units and service levels for management purposes.
- As well, we can do things like allocate more static data or reference data on a few nodes while rapidly changing or high-volume data could be populated on many nodes to be cost effective.
As you can see, Server Groups really enable the data store layer to act as more than one traditional database, it’s a fabric where multiple, traditional data-stores can come together.
SQLFire’s Horizontal Partitioning
Many companies seek the ability to scale the data layer horizontally. This means rows are held separately on different tables. Many have looked at sharding approaches to place fewer rows in a table, reduce index size of that table, and distribute portions of that table across many machines to increase performance.
With SQLFire, sharding is basically built-in as part of the underlying service. SQLFire uses the concept of buckets as the unit for storage of data, redundancy, and migration/rebalancing. This allows a frequently manipulated table to be placed across nodes in more manageable partitions. You can hash-partition a table based on primary key, internally generated unique row ID, columns other than primary key, range-partitioning, and list-partitioning.
Let’s look at an example. In the figure below, the example shows the Flights table partitioned by
FLIGHT_ID (primary key for Flights table) into three buckets, where the first bucket for rows 1 to 3 resides on vFabric SQLFire server 1, the second bucket for rows 4 to 6 resides on vFabric SQLFire server 2, and the third bucket for rows 7 to 9 resides on vFabric SQLFire server 3. vFabric SQLFire directs that all access to flights data by the
FLIGHT_ID primary key for rows 1 to 3 are executed on vFabric SQLFire Server 1, and similarly for the other rows. vFabric SQLFire automatically manages all partitioning through this bucketing system so long as the designer has provided the correct
PARTITION BY COLUMN(FLIGHT_ID) clause in the table definition.
To extend the example, let’s place it in the context of airline flights where we can see how redundancy and partitioning is used across both actively changing data and reference data with various data relationships and expected joins.
The schema below is typical of the master-detail design pattern seen in most RDBMS schemas with
FLIGHTAVAILABILITY have a one-to-many relationship.
AIRLINES, however has a many-to-many relationship to
FLIGHTAVAILABILITY. The Flights table is partitioned by its primary key
FLIGHT_ID and has a
REDUNDANCY of 1. This means that there is one backup copy of each FLIGHT on a redundant vFabric SQLFire Server somewhere in the cluster. On the other hand, the
FLIGHTAVILABILITY is partitioned by
FLIGHTS, and means that vFabric SQLFire manages the data partitions so that when there is a qualifying join between the
FLIGHTAVILABILITY tables, the query executes within the same vFabric SQLFire member/memory space, optimizing performance. The
AIRLINES table is reference data that does not change frequently and has a many-to-many relationship to
FLIGHTAVAILABILITY. To configure this relationship in vFabric SQLFire, you can use the REPLICATE keyword to indicate to vFabric SQLFire to place a complete copy of the
AIRLINES table on every vFabric SQLFire member in the distributed system.
vFabric SQLFire let’s you keep a continuous, fault tolerant model. With SQLFire, you can choose how many redundant copies of specific data to keep. SQLFire then manages the synchronization of all data changes across the primary copy and all backup copies. When a server fails, vFabric SQLFire automatically reroutes any operations that were trying to read or write from the failed member to the surviving members. There are tradeoffs in performance when more copies are kept, but most production systems find that one redundant copy with asynchronous disk persistence provides a solid combination of performance, scalability, and reliability.
>> For More Info: A significant portion of this article was sourced from the SQLFire Best Practices guide led by Emad Benjamin. This article is the second in a series, following last week's post on How SQLFire Performance Benchmark Blows the Traditional RDBMS Away.
- For background, you can read how the SQLFire chief architect, Jags, explains the design.
- You can achieve horizontal scale out using Java or .NET connections via ANSI SQL-92. For example, it only requires changing two lines to connect Grails apps to SQLFire.
- High availability is basically built in. Beyond the in-memory grid-oriented architecture, connecting clients interact through a transparent failover at the protocol level that uses multiple “service locators.” This means there is a fully redundant connection path—there is no change to the connection string. The locators also balance requests.
- Since SQLFire is written in Java, you can use Java stored procedures in the same memory and processing space that database operations are performed with NO additional overhead.
- SQLFire can act as a read-only, distributed cache for MySQL. In fact, it can be used many ways:
- It can be embedded into Java applications with replication, partitioning, persistence to disk, and distributed query participation.
- It can act as a front-end to databases. It uses write-thru distributed caching to load rows that aren’t in cache. It has a write-through pattern to synchronously write to external databases. It can also write asynchronously to external data stores.
- It can be used as a high-scale OLTP data store and supports WAN connectivity to make geographically distributed clusters look like one global system. This builds in disaster recovery and business continuity.
- It provides methods to make stored procedures run in parallel (basically, in-database map reduce types of processing).
- It can load 8 million rows in 88 seconds (40GB in 1 hour) in a smaller environment (2 VMs with 18GB of memory each).
- There are sample code bits out there like integrating SQLFire with Spring Petstore on GitHub.
- It’s built into the SpringTrader Reference Architecture, which has an HTML5-drive presentation layer interfacing through REST/JSON to a middle-tier on Spring with an AMQP message broker, and a JDBC connection to SQLFire.