vSAN

Oracle On VSAN 6

support_statementWhen VMware introduced VSAN 6, we were pretty clear: it’s a great fit for most anything that runs in a virtual machine, including critical databases.

Whereas much of our previously published performance testing focused on a large number of VMs pounding the storage subsystem (as that’s the norm for how most people use clusters), databases usually have a different performance profile: typically you have a small number of larger VMs that are doing all the heavy lifting.

wicked_fast2Not long ago, one of our engineer teams completed a performance profile using both Oracle 11g and Oracle RAC 11 against a modest, 4-node all-flash VSAN cluster.  All-flash makes great sense when you want predictably fast performance, regardless of the IO profile.

 

TL; DR — wicked fast and predictable performance, Oracle RAC scaled linearly as more instances were added, and all of the tested Oracle RAC availability features worked exactly as expected.

 

Note: if you’re planning to use Oracle RAC with VSAN 6, there’s a KB you’ll need to read about configuring VSAN for multi-writer .

And a big thanks to our friends at Intel for all the help with providing an environment for these tests!

 

The Test Bed

dbs2600cwt_1Four Intel S2600WT2 systems were used, each with 2 x ES-2699 v3 processors @ 2.3 GHz, 18 cores per socket and 128 GB of RAM.

A 400 GB Intel DCS3700 PCIe flash card was used for caching in each server, complemented by 3 480 GB Intel DCS3500 SATA SSDs used for capacity.

Oracle 11.2.0 RAC was used, running on vSphere 6 on Oracle Linux Server.

 

Oracle 11g Single Instance

Oracle_11g_single_instance_VSANIn our first test, we created a single large VM (18 vCPUs, 40 GB RAM) running Oracle, and pointed it at a single 100GB VMDK.

We did no optimizations other than setting policy to FTT=1 and stripe=6 for that VMDK. The stripe spreads a VMDK’s data across at least 6 capacity devices.

The test was a single instance of SwingBench running the Order Entry profile at 100 users — a classic OLTP mix.

The single-instance result was an average of 1296 transactions per second, or ~66K transactions per minute (TPM).  Note the very tight range of results over time: predictable performance.

It’s important to note that the cluster’s resources weren’t even beginning to sweat: plenty of CPU, memory and IO performance left for other tasks.  It’s also worth noting that this performance was achieved “out of the box”: no tuning or other optimizations.

 

Oracle RAC Performance

Oracle_11_RAC_four_instances_VSANHere, we want to understand how Oracle RAC performance scales as more instances are added.  We went for the big numbers first — four instances, one each on four nodes.

Using the same test bed as before, we created four 150GB VMDKs, for total of 600GB to be used as Oracle ASM disk groups. Again, the same FTT=1, stripe=6 policy was used for these VMDKs.  400 users were configured with the same SwingBench Order Entry profile as before.

 

oracle_scaling_RAC_VSANPerformance was stellar, as you might imagine.  We’re now at 414310 transactions per minute, or right around ~7K transactions per second on our modest four-node cluster, with each transaction experiencing an average of 12 milliseconds of response time — remember, that’s measured from the application perspective, not storage.

Once again, significant resources are left in the cluster for other work.  And, as before, these results were achieved “out of the box” without optimization or tuning.

The last thing we wanted to investigate was scaling — how does performance increase as we go from one to four Oracle RAC instances?

No surprise, we saw linear performance scaling as more instances were added. Four instances delivered 93% of the performance (414,310 TPM) of four times a single instance (111,378 TPM).

Not bad at all.

 

Using vMotion with Oracle RAC

Oracle_vMotion_1One of the more popular features with vSphere is vMotion, which enables administrators to dynamically move running VMs within the cluster without application impact.

This could be for load balancing, or perhaps taking a server down for planned maintenance. It’s a neat feature.

For this test, we used the same 4-instance, 4-node setup as previously, but pushed a slightly more moderate workload to get a feel for what a customer might experience.

The vMotion itself took 19 seconds.  Eyeballing the chart, it looked like performance only took a minor ~10% hit during that period. Not bad at all.

oracle_vMotion_2Note: certain hyperconverged storage products emphasize the fact that they use data locality in an attempt to put VMs and their storage on the same server.

While the performance benefits of doing so are questionable, one clear impact is that substandard performance can result during vMotion and DRS usage, as now the data has to be copied behind the VMs new location, and copied back when the VM is moved back.

 

Oracle RAC and vSphere HA

Oracle_HA_VSANYet another useful feature in vSphere is High Availability, or HA. If a server unexpectedly fails, vSphere automatically restarts the VM elsewhere in the cluster.

In this test, we ran 4 Oracle RAC instances as before.  We then powered down a server to simulate a drastic failure.

In our testing, it took HA 132 seconds to detect the failure and restart the (large) VM.  Once running, it took Oracle an additional 119 seconds to recover the database and restore service.

End to end, a little over four minutes for the entire sequence.  Of course, all data was available to all database instances during this time.

 

Final Thoughts

small_packagesEven a modest, four-node VSAN cluster can deliver truly impressive Oracle performance at modest cost and with extreme simplicity.  It wasn’t that long ago when we were all using really big iron, and proud of the fact we could achieve hundreds of transactions per second.

Now, a humble four-node server cluster running VSAN can easily achieve ~7,000 Oracle SwingBench transactions per second — or more — without any optimization.

Better yet — even under these very demanding workloads — there’s plenty of resources available to do other work: more database instances, or whatever you need to do.