Home > Blogs > VMware vFabric Blog > Category Archives: Web/Tech

Category Archives: Web/Tech

Putting the ‘Single’ Back in Single Sign-On (SSO)

Modern companies and IT organizations have many applications, both internal and customer facing.  With these many applications your users are faced with the challenge of not only managing multiple sets of credentials, but are also forced to login to each and every individual application separately.  This creates a bad experience for your users.

To improve user experience, IT created a concept called Single Sign On (SSO). The idea was users could sign on once, and the SSO software would automatically authenticate them for all their applications. This not only helped the user experience, but also helped IT by cutting down on the number of ‘forgot password’ tickets opened and when users left the organization, it made de-authenticating them really easy. The idea is great, but in practice it frequently stopped short at authentication.  Continue reading

Q&A with Brian Dussault

1. You’ve been involved with SpringTrader for a while, could you give us a bit about your background and how you’ve been involved?
I’ve been involved in the technology industry for the past 14 years. I worked in both IT (High Tech Manufacturing, Financial Industries) and R&D positions during this time. My experience spans multiple disciplines including web applications, enterprise integration, SOA, open source, and system design. I have been a Spring Framework user since 2006 and joined VMware in 2011 as a Staff Engineer on the vFabric Commercial Engineering team.

Upon joining VMware, I led a small, highly talented team of engineers in the creation of the vFabric Reference Application, previously referred to as NanoTrader and now known as SpringTrader. The goal of the reference application was to provide an end-to-end sample application that included the Spring Framework, vFabric and our virtualization products.
Continue reading

Powering Mobile Architecture with vFabric

Mobile is driving a lot of new application development. But, how can vFabric help?

We’ve all heard the incredible growth trends of mobile. As an example, “The Future of Mobile” presentation by Henry Blodget, CEO of Business Insider (2012), shows how mobile apps are now a $10B market and growing at 100% per year. See below.

This has led a number of companies, including Google, to adopt “mobile first” development strategies: build first for smartphones, then for laptops and desktops. At other companies, such as Urbanspoon, mobile growth is outpacing their desktop web traffic. It’s said that if Facebook were built today, it would be a mobile app.

The Mobile UI Development Dilemma

The challenge facing mobile application developers is that there are three major mobile UI technology stacks – iOS, Android Java, and HTML5 (mobile web), each with their own pros and cons. Continue reading

How to integrate SQLFire into your Maven project

Sqlf-logoJust a quick post to address a potential issue regarding integration of the latest SQLFire release (1.0.2) into your Maven project. You may not know this, but, the VMware GemStone team maintains its own Maven repository.

For the latest versions of SQLFire and GemFire dependencies simply integrate the GemStone repository into your pom.xml:

<repository>
   <id>gemstone</id>
   <name>Release bundles for SQLFire and GemFire</name>
   <url>http://dist.gemstone.com.s3.amazonaws.com/maven/release</url>
</repository>

and define the necessary dependency, in this case SQLFire:

<dependency>
   <groupId>com.vmware.sqlfire</groupId>
   <artifactId>sqlfireclient</artifactId>
   <version>{com.vmware.sqlfireclient.version}</version>
</dependency>

Hope this helps.

Using SQLFire as a read-only cache for MySQL.

A question came up in our forums about using SQLFire to cache data from a database using lazy loading. I wanted to provide a fairly concrete example that uses MySQL that should give you all the info you need to try it out yourself.

Why do this?

SQLFire can play a number of roles. For one thing it has its own persistence right out-of-the-box so it can be your database of record unassociated with any other database and brings big advantages in terms of speed, eliminating single-points-of-failure and so on. But chances are you have a database already and would like to get a feel for SQLFire without a huge commitment. In this case there are a couple of options, SQLFire can act as a read/write layer atop the database, persisting database to a more traditional RDBMS, or SQLFire can act as a read-only cache that loads data in as you need it. Many times this latter case is the least commitment way to try SQLFire out, after all your database may have stored procedures or other things that make it difficult to port to another database.

Putting a read-only cache atop an RDBMS helps by offloading some reads to another server. Let's think about a database that is constantly getting reads and writes for a lot of different users (high concurrency). This sort of situation easily leads to disk thrashing which leads to extremely bad performance unless you're willing to throw lots of money into storage. Serving a portion of reads in-memory cuts down on this thrash, making your write throughput go up on the underlying database. In addition, an in-memory architecture like SQLFire handles high concurrency extremely well since it doesn't suffer from thrashing problems the way disk-based databases do. Another way caching brings read throughput up is by not forcing every read to be consistent with every write. I'll talk about this more in the next section.

Before you begin, understand these.

  1. When it comes to read-only caching, SQLFire 1.0 supports accessing database rows based on primary key. This sort of access pattern is pretty similar to what you might do with a memcache or a Redis, except that you get RDBMS rows instead of objects, which has some advantages in terms of reusing code you already have.
  2. Second, SQLFire caches the data. That might sound silly to say, after all that's the goal, but it also means that when you use SQLFire to cache, the data in SQLFire will be stale to some extent. If SQLFire were to reach out to the database on every access there would be no performance benefit, probably just the opposite. Your application and use case need to be ok with data that is stale. By adopting this "eventually consistent" approach the system as a whole does less work and higher throughput is possible. As more and more applications are exposed to the big-I Internet people have been forced to realize that there are quite a few places in their application where eventual consistency is perfectly fine. How stale the data should be is configurable as we'll see below.

Step 1: Install and configure MySQL, including the MySQL JDBC Connector

I'm not going to give detailed instructions on this one, I'm going to assume you have a database up and running already. You should also get the MySQL JDBC connector. If you're on Ubuntu for instance just "apt-get install libmysql-java" and it will appear in /usr/share/java/mysql-connector-java.jar. Similar if you're on Fedora using yum. If you're on another platform I wish you luck in your adventure to track that file down, this blog will still be here when you return.

By the way, if you're using some database other than MySQL, very little changes in these instructions, all you need to do is find the appropriate JDBC driver and change the connection strings.

Step 2: Create a table and some data in MySQL.

Again I'm going to mostly assume that you know how to do this but I'll create a table called names and insert a few rows. It's important that the table use a primary key.

Step 3: Compile the JDBCRowLoader sample (and put it in a JAR file).

So it turns out there is some assembly required if you want to use SQLFire as a read-only cache. Luckily it's not too painful, everything you need is included in the SQLFire kit, when you install you'll have a directory called examples with some Java code in it. One of these files is JDBCRowLoader.java, which is a sample class that implements the RowLoader interface. There are a number of ways to approach the RowLoader interface but this example is quite general-purpose. To get things ready to use, run these three commands from within the examples directory:

  1. javac -classpath ../lib/sqlfire.jar JDBCRowLoader.java
  2. cd ..
  3. jar -cf examples.jar examples

Now you have a JAR file you can use as a row loader. This generic row loader even handles cases where you want fewer columns in SQLFire than in MySQL using the query-columns parameter. You can implement your own row loader if you need to. Why would you do that? Most likely you would want to apply some transformation to the data as you're reading it in. For one example you may have noticed that SQLFire 1.0 doesn't support the boolean datatype yet. If your MySQL table had boolean data, you could use your RowLoader to transform it into an integer value of 0 or 1 as appropriate. For this demo the generic RowLoader is more than enough.

Step 4: Start SQLFire with an extended classpath.

We need SQLFire to understand where the MySQL JDBC driver is, as well as our JDBCRowLoader code is. To do this we will need to use -classpath when starting SQLFire. Here's an example:

  • sqlf server start -classpath=/usr/share/java/mysql-connector-java.jar:/home/carter/sqlf/examples.jar

You will need to substitute the paths to your examples.jar and MySQL connector. After a successful start you can connect as you normally would.

Step 5: Re-create the table in SQLFire and call SYS.ATTACH_LOADER to hook the two together.

First we create a table as in MySQL except we don't populate any data. Next the special ATTACH_LOADER call is invoked to hook SQLFire to MySQL.

Here's what this should look like in your session.

  Linking SQLFire and MySQL

Now when you query SQLFire by primary key, rows will be loaded in on demand. You should also understand that if you don't specify a where clause when querying the table, SQLFire will report whatever keys have been cached thus far rather than everything in the underlying table. Chances are this is not what you want, so be careful of it.

Select from SQLFire caching MySQL

Step 6: Controlling cache size and data freshness.

The table created above is not the sort of thing you really want when you're caching with SQLFire for two reasons:

  1. First of all, data is never updated once it's placed into the cache. Ever. No matter how much the underlying data changes in MySQL, those changes will never make their way into SQLFire. Not likely what you want.
  2. The table will grow indefinitely large. Chances are you'd rather cache a subset of your data since you're using SQLFire as an in-memory layer.

Overcoming both these problems is very easy to do in SQLFire, using CREATE TABLE semantics. Here's an example:

Let's take a quick look at what this does. The first difference between this statement and the one above is the EVICTION BY LRUCOUNT 50000 clause. This means that SQLFire will maintain up to 50,000 entries in the cache before taking action. The action is configurable but this setting causes SQLFire to simply discard the data, perfectly appropriate for a cache. The second line is more interesting to most people, EXPIRE ENTRY WITH TIMETOLIVE 300. This says that an entry should be kept in the cache no more than 5 minutes. The net effect is to guarantee that data is never more than 5 minutes stale. After 5 minutes the data is discarded and must be re-fetched from the underlying database. If you have a fairly short TTL you might not even want to set a maximum cache size.

Variations.

Here are a few simple variations on this demo:

  1. Change your cache policy by changing the LRUCOUND or TIMETOLIVE parameters in the CREATE TABLE statement.
  2. If you want to act as a cache on top of Oracle or some other database, just change the url parameter to mention the appropriate driver and URL. Note you will need the respective JDBC JAR for that database to be in the classpath when you start SQLFire.

Questions? Comments? Join us in the SQLFire Community.

 

SQLFire 1.0 – the Data Fabric Sequel

Sizes_with_names_128x128
(Note this blog is simulposted here and on Jags Ramnaryan's blog. Jags is our Chief Architect and his blog is well worth subscribing to!)

This week we finally reached GA status for VMware vFabric SQLFire – a memory-optimized distributed SQL database delivering dynamic scalability and high performance for data-intensive modern applications.

In this post, I will highlight some important elements in our design and draw out some of our core values.

The current breed of popular NoSQL stores promote different approaches to data modelling, storage architectures and consistency models to solve the scalability and performance problems in relational databases. The overarching messages in all of them seems to suggest that the core of the problem with traditional relational databases is SQL.

But, ironically, the core of the scalability problem has little to do with SQL itself – it is the manner in which the traditional DB manages disk buffers, manages its locks and latches through a centralized architecture to preserve strict ACID properties that represents a challenge. Here is a slide from research at MIT and Brown university on where the time is spent in OLTP databases.

OLTP Where Does The Time Go

Design Center

With SQLFire we change the design center in a few interesting ways:

1) Optimize for main memory: we assume memory is abundant across a cluster of servers and optimize the design through highly concurrent data structures all resident in memory. The design is not concerned with buffering contiguous disk blocks in memory but rather manages application rows in memory hashmaps in a form so it can be directly consumed by clients. Changes are synchronously propagated to redundants in the cluster for HA.

2) Rethink ACID transactions: There is no support for strict serializable transactions but assume that most applications can get by with simpler "read committed" and "repeatable read" semantics. Instead of worrying about "read ahead" transaction logs on disk, all transactional state resides in distributed memory and uses a non-2PC commit algorithm optimized for small duration, non-overlapping transactions. The central theme is to avoid any single points of contentions like with a distribtued lock service. Our documentation has more details on how transactions work in SQLFire.

3) "Partition aware DB design": Almost every single high scale DB solution offers a way to linearly scale by hashing keys to a set of partitions. But, how do you make SQL queries and DML scale when they involve joins or complex conditions? Given that distributed joins inherently don't scale we promote the idea that the designer should think about common data access patterns and choose the partitioning strategy accordingly. To make things relatively simple for the designer, we extended the DDL (Data definition language in SQL) so the designer can specify how related data should be colocated ( for instance 'create table Orders (…) colocate with Customer' tells us that the order records for a customer should always be colocated onto the same partition). The colocation now makes join processing and query optimization a local partition problem (avoids large transfers of intermediate data sets). The design assumes classic OLTP workload patterns where vast majority of individual requests can be pruned to a few nodes and that the concurrent workload from all users is spread across the entire data set (and, hence across all the partitions). Look here for some details.

4) Shared nothing logs on disk: Disk stores are merely "append only" logs and designed so that application writes are never exposed to the disk seek latencies. Writes are synchronously streamed to disk on all replicas. A lot of the disk store design looks similar to other NoSQL systems – rolling logs, background/offline compression, memory tables pointing to disk offsets, etc. But, the one aspect that represents core IP is all around managing consistent copies on disk in the face of failures. Given that distributed members can come and go, how do we make sure that the disk state a member is working with is the one I should be working with? I cover our "shared nothing disk architecture" in lot more detail in this blog post on GemFire 6.5.

5) Parallelize data access and application behavior: We extend the classic stored procedure model by allowing applications to parallelize the procedure across the cluster or just a subset of nodes by hinting the data the procedure is dependent on. This applicaton hinting is done by supplying a "where clause" that is used to determine where to route and parallelize the execution. Unlike traditional databases, procedures can be arbitrary application Java code (you can infact embed the cluster members in your Spring container) and run collocated with the data. Yes, literally in the same process space where the data is stored. Controversial, yes, but, now your application code can do a scan as efficiently as the database engine.

6) Dynamic rebalancing of data and behavior: This is the act of figuring out what data buckets should be migrated when new capacity (cluster size grows) is allocated (or removed) and how to do this without causing consistency issues or introducing contention points for concurrent readers and writes. Here is the patent that describes some aspects of the design.

Embedded or a client-server topology

SQLFire supports switching from the classic client-server (your DB runs in its own processes) topology to embedded mode where the DB cluster and the application cluster is one and the same (for Java apps). 

We believe the emdedded model will be very useful in scenarios where the data sets are relatively small. It simplifies deployment concerns and at the same time provides significant boost in performance when replicated tables are in use.

All you do is change the DB URL from 'jdbc:sqlfire://server_Host:port' to 'jdbc:sqlfire:;mcast-port=portNum' and now all your application processes that use the same DB URL will become part of a single distributed system. Essentially, the mcast-port port identifies a broadcast channel for membership gossiping. New servers will automatically join the cluster once authenticated. Any replicated tables will automatically get hosted in the new process and partitioned tables could get rebalanced and share some of the data with the new process. All this is abstracted away from the developer.

As far as the application is concerned, you just create connections and execute SQL like with any other DB.

Topology1-whitebg

Topology2-whitebg

 

How well does it perform and scale?

Here are the results of a simple benchmark done internally using commodity (2 CPU) machines showcasing linear scaling with concurrent user load. I will soon augment this with more interesting workload characterization. We have more details on the SQLFire community.

SQLFire Linearly Scaling Queries

 

SQLFire Linear Scale Low Latency

 

Comparing SQLFire and GemFire

Here is a high level view into how the two products compare. I hope to add a blog post that provides specific details on the differences and use cases where one might apply better than the other.

GemFire versus SQLFire

SQLFire benefits from the years of commercially deployed production code found in GemFire. SQLFire adds a rich SQL engine with the idea that now folks can manage operational data primarily in memory, partitioned across any number of nodes and with a disk architecture that avoids disk seeks. Note the two offerings, SQLFire and GemFire, are completely unique products and deployed separately

As always, I would love to get your candid feedback (link to our forum). I assure you that trying it out is very simple – just like using Apache Derby or H2.

Get to the download, docs and quickstart all from here. The developer license is perpetual and works on upto 3 server nodes.