TL;DR: Troubleshooting Oracle performance issues is non-trivial. Pay attention to Oracle’s memory management settings when experiencing local slowness, particularly if running against Oracle XE (express edition) where memory thresholds a quite conservative by default.
In circuit racing, there’s an old mantra,
Slow in, Fast out.
The principle is pretty simple. The driver comes in fast, towards a corner, brakes to complete braking before the apex, and gets on the gas nice and early to power out of the corner, maintaining maximum velocity throughout. So how does all this relate to Oracle? Well, as with all things, this process requires a little tuning from time to time, vehicle to vehicle. Your database may be performing well one day, then you make a change, and something that functioned quite well, no longer does so.
The other day, we encountered this very situation on a project, and I’d like to take you through what happened, our findings, and how we resolved it.
The application I’m currently working on, is a mathmatically intensive product to provide financial insight on potential future sales. As the application was quite data-heavy, we decided fairly early on in development that it would be appropriate to leverage the database as a compute engine, and perform many of the summation and aggregation calculations on data at-source, thereby saving us from loading a bunch of stuff into memory, iterating over it, and performing the relevant calculations. This had been serving our needs very well.
We had a database view which was reasonably straightforward, performing a number of inner joins on indexed ID columns, across 7 tables. Involved, but not rocket science. The view’s performance had always been in the order of 0.1 to 0.2 seconds.
Slow
Running our rspec suite one afternoon we encountered an Oracle out of memory error:
COULD NOT ALLOCATED 32 BYTES OF SHARED MEMORY
We re-ran the test suite and noticed it was considerably slower. So slow in fact, that a number of tests were simply timing out. Assuming a simple fix, we restarted the local oracle instance, and the virtual machine on which it was running. Same result. Painfully slow tests, with random timeouts. More investigation required.
Having verified that no changes to the database view DDL (schema) had been made, our attention turned to the data itself. We took a look at the explain plan generated for the SQL statement that creates our view. Explain plans aren’t the most user friendly things to decode without prior experience, but it was fairly evident that something was awry. Our plan looked something like this:
NESTED LOOPS
NESTED LOOPS
NESTED LOOPS
NESTED LOOPS
NESTED LOOPS
NESTED LOOPS
NESTED LOOPS
It took a whopping 11.5 seconds to return 240 rows.
As a means of comparison, we executed the same query on our integration environment and saw the expected performance (0.2sec) for query execution. The explain plan looked like this:
NESTED LOOPS
HASH JOIN
HASH JOIN
RESULT
Clearly something was wrong. We started considering the differences between the two environments. Both environments were running Oracle 11g, but locally we were using the express edition (XE). Suspecting performance may be bound on available memory, we looked into how to configure Oracle’s memory allocation.
PGA/SGA/WTF
A couple of searches later, we determined that Oracle allocates a block of memory on startup referred to as SGA (System Global Area). The size of this can be obtained from a running instance by running:
SHOW PARAMETER SGA_TARGET;
Oracle also allows configuration of memory on a per-process basis. This is acheived by setting PGA (Process Global Area). Again, this value can be obtained running:
SHOW PARAMETER PGA_TARGET;
At at the Oracle console. Using Oracle XE, this defaults to a reasonably conservative 200MB. We discovered that Oracle XE only supports allocation of up to 1GB of RAM so took a look at how we could increase our measley 200MB allocation.
PFILE/SPFILE
Oracle’s configuration and initialization files are stored in a binary format which wasn’t particularly helpful when attempting to troubleshoot our issues. After some digging, we discovered the recommended way to interact with Oracle’s binary configuration is to generate a text version of the config file, test your changes, and then re-generate the binary version from your new text file.
You can locate the SPFILE currently in use by running:
SHOW PARAMETER SPFILE;
# NAME TYPE VALUE
# ----------- -------- --------
# spfile string /u01/oracle/admin/conf/spfileMYDB.ora
The first step was to generate a text version of the binary config file:
CREATE PFILE='/home/oracle/mypfile.ora' FROM SPFILE;
Opening up the newly generated pfile, we saw something like this:
XE.__db_cache_size=201326592
XE.__java_pool_size=4194304
XE.__large_pool_size=4194304
XE.__oracle_base='/u01/app/oracle'#ORACLE_BASE set from environment
XE.__pga_aggregate_target=101018323
XE.__sga_target=202482304
XE.__shared_io_pool_size=0
XE.__shared_pool_size=310378496
XE.__streams_pool_size=0
*.audit_file_dest='/u01/app/oracle/admin/XE/adump'
*.compatible='11.2.0.0.0'
*.control_files='/u01/app/oracle/oradata/XE/control.dbf'
*.db_name='XE'
*.DB_RECOVERY_FILE_DEST='/u01/app/oracle/fast_recovery_area'
*.DB_RECOVERY_FILE_DEST_SIZE=10G
*.diagnostic_dest='/u01/app/oracle'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=XEXDB)'
*.job_queue_processes=4
*.memory_target=206520896
*.open_cursors=300
*.remote_login_passwordfile='EXCLUSIVE'
*.sessions=201326592
*.shared_servers=4
*.undo_management='AUTO'
*.undo_tablespace='UNDOTBS1'
Of particular interest were __pga_aggregate_target
, __sga_target
and memory_target
As described above, pga_target and sga_target variables specify the memory to allocate for Oracle processes.
We altered these parameter so they read:
...
XE.__pga_aggregate_target=281018323
XE.__sga_target=602482304
...
*.memory_target=806520896
...
To test our changes, we restarted our local Oracle instance using the pfile we’d just created.
STARTUP PFILE=/home/oracle/mypfile.ora
The database reported the new values had been loaded correctly.
We restarted our instance and … it was still painfully slow. Frustration growing, we did some more digging. We suspected that either our changes had made no difference, or there was some other factor in the query plan. At this point we made some useful discoveries as to how the query optimizer works.
Paraphrasing Oracle’s documentation, the Query Optimizer in the DB engine generates statisics about the tables within your schema, around the data distribution and storage characteristics of tables, columns, indexes, and partitions. It then uses these stats to calculate the most efficient way to execute your given query. This is actually quite cool for large scale deployments, insofar as these statistics can be exported, and used in another instance of the database, such as a test environment where you can replicate a production environment’s performance characteristics. Further details here.
With the suspicion that these stats may have been optimized to cope with the relatively low previous memory footprint, we looked into how to clear or regenerate these stats.
Regenerate for Joy
We regenerated table statistics for our tables. Happy days. Performance was restored, brows were swept, and the world was good.
EXEC DBMS_STATS.gather_table_stats('MYSCHEMA', 'BIG_TABLE', estimate_percent => DBMS_STATS.auto_sample_size);