Home > Blogs > VMware vFabric Blog > Monthly Archives: January 2012

Monthly Archives: January 2012

Love data? The SQLFire and GemFire teams are hiring.

HelpwantedThere's a lot of change underway in data. It's become clear that traditional database systems are not going to survive the transitionto cloud and fundamental changes are needed. Databases need to offer tunable consistency to have any hope of dealing with web scale.Databases need to handle geographic distribution effortlessly. Databases need to offer more flexible data models that support agile development. Databases can't be tied to purely physical consumption models.

VMware sees these new data programming and consumption models as very important to VMware's overall mission of simplifying IT.

VMware's got a lot going on in data, you might have heard of vFabric Data Director or vFabric Postgres. These are just a few of the many data projects VMware has in the works.

This blog focuses on SQLFire, a memory-optimized distributed SQL database, which today embodies some of these needed changes withmore on the way. VMware also has GemFire, a memory-optimized distributed key/value store. For many years GemFire has been the solution to some of the world's most challenging data problems, particularly when it comes to speed and low latency. (If you peek under the covers you'll see that SQLFire is built using GemFire technology and shares a lot of its great qualities.)

When it comes to GemFire and SQLFire, we're focused on high-performance transactional databases that are linearly scalable and globally distributed. We're looking for top talent to join us on our SQLFire / GemFire teams and join us on our journey to data in the cloud.

Engineering

We need talented developers, particularly those with deep Java, database or distributed systems backgrounds. See our job site for more details on GemFire jobs or SQLFire jobs. We're looking for people in a number of areas, including Beaverton/Portland, The Bay Area, and in Pune, India. It's your opportunity to work with a great team who builds amazing and disruptive technology, and in a great company on top of that.

Technical Evangelism

Love building demos? Can you write a great blog? Love travelling the world, meeting interesting people and talking about querying and horizontal scaling with them? Let's talk. See our job description for more details. If this sounds like a perfect fit for you, email me at cshanklin@vmware.com or our friendly recruiter bronsone@vmware.com.

Product Management

VMware's commitment level is high and our slate is pretty clean. It's a perfect opportunity for a product manager with great ideas about how data ought to work. Check out our job description and if that sounds like you, email me at cshanklin@vmware.com or bronsone@vmware.com.

Hoping to hear from you!

Using SQLFire as a read-only cache for MySQL.

A question came up in our forums about using SQLFire to cache data from a database using lazy loading. I wanted to provide a fairly concrete example that uses MySQL that should give you all the info you need to try it out yourself.

Why do this?

SQLFire can play a number of roles. For one thing it has its own persistence right out-of-the-box so it can be your database of record unassociated with any other database and brings big advantages in terms of speed, eliminating single-points-of-failure and so on. But chances are you have a database already and would like to get a feel for SQLFire without a huge commitment. In this case there are a couple of options, SQLFire can act as a read/write layer atop the database, persisting database to a more traditional RDBMS, or SQLFire can act as a read-only cache that loads data in as you need it. Many times this latter case is the least commitment way to try SQLFire out, after all your database may have stored procedures or other things that make it difficult to port to another database.

Putting a read-only cache atop an RDBMS helps by offloading some reads to another server. Let's think about a database that is constantly getting reads and writes for a lot of different users (high concurrency). This sort of situation easily leads to disk thrashing which leads to extremely bad performance unless you're willing to throw lots of money into storage. Serving a portion of reads in-memory cuts down on this thrash, making your write throughput go up on the underlying database. In addition, an in-memory architecture like SQLFire handles high concurrency extremely well since it doesn't suffer from thrashing problems the way disk-based databases do. Another way caching brings read throughput up is by not forcing every read to be consistent with every write. I'll talk about this more in the next section.

Before you begin, understand these.

  1. When it comes to read-only caching, SQLFire 1.0 supports accessing database rows based on primary key. This sort of access pattern is pretty similar to what you might do with a memcache or a Redis, except that you get RDBMS rows instead of objects, which has some advantages in terms of reusing code you already have.
  2. Second, SQLFire caches the data. That might sound silly to say, after all that's the goal, but it also means that when you use SQLFire to cache, the data in SQLFire will be stale to some extent. If SQLFire were to reach out to the database on every access there would be no performance benefit, probably just the opposite. Your application and use case need to be ok with data that is stale. By adopting this "eventually consistent" approach the system as a whole does less work and higher throughput is possible. As more and more applications are exposed to the big-I Internet people have been forced to realize that there are quite a few places in their application where eventual consistency is perfectly fine. How stale the data should be is configurable as we'll see below.

Step 1: Install and configure MySQL, including the MySQL JDBC Connector

I'm not going to give detailed instructions on this one, I'm going to assume you have a database up and running already. You should also get the MySQL JDBC connector. If you're on Ubuntu for instance just "apt-get install libmysql-java" and it will appear in /usr/share/java/mysql-connector-java.jar. Similar if you're on Fedora using yum. If you're on another platform I wish you luck in your adventure to track that file down, this blog will still be here when you return.

By the way, if you're using some database other than MySQL, very little changes in these instructions, all you need to do is find the appropriate JDBC driver and change the connection strings.

Step 2: Create a table and some data in MySQL.

Again I'm going to mostly assume that you know how to do this but I'll create a table called names and insert a few rows. It's important that the table use a primary key.

Step 3: Compile the JDBCRowLoader sample (and put it in a JAR file).

So it turns out there is some assembly required if you want to use SQLFire as a read-only cache. Luckily it's not too painful, everything you need is included in the SQLFire kit, when you install you'll have a directory called examples with some Java code in it. One of these files is JDBCRowLoader.java, which is a sample class that implements the RowLoader interface. There are a number of ways to approach the RowLoader interface but this example is quite general-purpose. To get things ready to use, run these three commands from within the examples directory:

  1. javac -classpath ../lib/sqlfire.jar JDBCRowLoader.java
  2. cd ..
  3. jar -cf examples.jar examples

Now you have a JAR file you can use as a row loader. This generic row loader even handles cases where you want fewer columns in SQLFire than in MySQL using the query-columns parameter. You can implement your own row loader if you need to. Why would you do that? Most likely you would want to apply some transformation to the data as you're reading it in. For one example you may have noticed that SQLFire 1.0 doesn't support the boolean datatype yet. If your MySQL table had boolean data, you could use your RowLoader to transform it into an integer value of 0 or 1 as appropriate. For this demo the generic RowLoader is more than enough.

Step 4: Start SQLFire with an extended classpath.

We need SQLFire to understand where the MySQL JDBC driver is, as well as our JDBCRowLoader code is. To do this we will need to use -classpath when starting SQLFire. Here's an example:

  • sqlf server start -classpath=/usr/share/java/mysql-connector-java.jar:/home/carter/sqlf/examples.jar

You will need to substitute the paths to your examples.jar and MySQL connector. After a successful start you can connect as you normally would.

Step 5: Re-create the table in SQLFire and call SYS.ATTACH_LOADER to hook the two together.

First we create a table as in MySQL except we don't populate any data. Next the special ATTACH_LOADER call is invoked to hook SQLFire to MySQL.

Here's what this should look like in your session.

  Linking SQLFire and MySQL

Now when you query SQLFire by primary key, rows will be loaded in on demand. You should also understand that if you don't specify a where clause when querying the table, SQLFire will report whatever keys have been cached thus far rather than everything in the underlying table. Chances are this is not what you want, so be careful of it.

Select from SQLFire caching MySQL

Step 6: Controlling cache size and data freshness.

The table created above is not the sort of thing you really want when you're caching with SQLFire for two reasons:

  1. First of all, data is never updated once it's placed into the cache. Ever. No matter how much the underlying data changes in MySQL, those changes will never make their way into SQLFire. Not likely what you want.
  2. The table will grow indefinitely large. Chances are you'd rather cache a subset of your data since you're using SQLFire as an in-memory layer.

Overcoming both these problems is very easy to do in SQLFire, using CREATE TABLE semantics. Here's an example:

Let's take a quick look at what this does. The first difference between this statement and the one above is the EVICTION BY LRUCOUNT 50000 clause. This means that SQLFire will maintain up to 50,000 entries in the cache before taking action. The action is configurable but this setting causes SQLFire to simply discard the data, perfectly appropriate for a cache. The second line is more interesting to most people, EXPIRE ENTRY WITH TIMETOLIVE 300. This says that an entry should be kept in the cache no more than 5 minutes. The net effect is to guarantee that data is never more than 5 minutes stale. After 5 minutes the data is discarded and must be re-fetched from the underlying database. If you have a fairly short TTL you might not even want to set a maximum cache size.

Variations.

Here are a few simple variations on this demo:

  1. Change your cache policy by changing the LRUCOUND or TIMETOLIVE parameters in the CREATE TABLE statement.
  2. If you want to act as a cache on top of Oracle or some other database, just change the url parameter to mention the appropriate driver and URL. Note you will need the respective JDBC JAR for that database to be in the classpath when you start SQLFire.

Questions? Comments? Join us in the SQLFire Community.