Home > Blogs > VMware vFabric Blog > Author Archives: Michael Lazar

Author Archives: Michael Lazar

Java Stored Procedure Performance — Myths vs. Facts

“Java Stored Procedures will never be as fast as SQL Stored Procedures” is a common refrain from database administrators based on their experience with stored procedures in “normal” relational databases. Most databases are written in a low level language such as C. SQL is managed as a byte-code compiled language with the equivalent of a virtual machine optimized for SQL to retrieve data and execute the other instructions in the SQL language and the extensions for stored procedures. To have Java Stored Procedures, the database has to spin up a Java Virtual Machine (JVM) or use an existing one. Then, the database must communicate with it across the JNI boundary or use some interprocess communication mechanism. Similarly, the code running in the JVM must send data back to the main database processes. Likely there is a lot of bi-directional processing as the database engine has to hand off data such as pointers to rows on disk, or requests to look up data. The alternative is to duplicate the indexes and other meta data used in the database in the Java memory space. All of this overhead guarantees that Java Stored Procedures will generally be much slower than SQL Stored Procedures for main stream databases.

    SQLFire is different. It is written in Java. All of the memory is in the Java VM space. SQL statements are byte compiled into Java structures. Java Stored Procedures run in the same memory and processing space that database operations are performed. There is NO overhead for using them. It is likely that writing a non-Java stored procedure language to run in SQLFire would be slower than using Java as the stored procedure language, because these other instructions would have to be mapped into Java operation for the JVM.

    SQLFire provides other significant advantages for performance of stored procedures. First like other data bases written to be primarily in-memory databases, primary key based data access or index lookups, result in direct pointers to rows in memory. No pulling data off pages of memory or disk. Row data is stored in a byte format for rows. It consists of fixed length fields followed by variable length fields, etc to keep footprint to a minimum. When operating on a ResultSet locally (or otherwise) you never get direct access to this byte. There is no deserialization. The overhead comes down to how you access the individual fields in the row. Note: JDBC provides a lot of flexibility with implicit type conversions.

    Another key advantage for store procedures in SQLFire is that unlike other memory oriented databases, SQLFire is a distributed database so that stored procedures and query operations are parallelized across the grid. This means that if you have 10 SQLFire instances, stored procedures are run in parallel across the subsets of the data on each member resulting in ten times as many CPU’s being involved in a single stored procedure invocation. There is some overhead on collating results (depending on what you are doing) so that the system may only be 8 or 9 times faster than doing the operation on a single SQLFire instance. Still this is a noticeable performance improvement.

    One of the concerns for traditional DBAs is that they are not Java programmers. SQLFire preserves the standard SQL language model for DBA’s to perform all of their normal tasks, while enabling developers to move their logic to the database where their code can get enhanced performance through parallelization and direct access to the data in memory. In this way the SQLFire stored procedure model is biased towards Java application developers who are used to running their code in Application server JVMs (i.e. my code runs alongside the container in the same process). SQLFire provides them the mechanism to avoid moving data across the wire and allows their code to run in-situ with the data. Doing this you get impressive access latency especially when iterating over sets. Using this sort of model, we have seen application performance gains of one or two orders of magnitude. i.e 10x to 100x performance speed up.

vFabric now has a common language (Java) that can be used in both the app server and the database. This means that app developers can write stored procedures in a language they already understand, with a tool chain they already have.