Pulling a rabbit out from a hat when you thought the magician had magically whisked him away forever to a far far away land – all of us have seen that.
What if we tell you data in memory persists across server reboots – that’s our rabbit magically reappearing every time the magician waves his magic wand and says ‘abracadabra’.
Enter Persistent Memory Technology where Persistent Memory (PMEM) resides between DRAM and disk storage in the data storage hierarchy. This technology enables byte-addressable updates and does not lose data if power is lost.
PMEM is a byte-addressable form of computer memory that has the following characteristics:
- DRAM-like latency and bandwidth
- Regular load/store CPU instructions
- Paged/mapped by operating system just like DRAM
- Data is persistent across reboots
Business Critical Oracle databases would benefit from using Persistent Memory technology to provide the resilience and SLA guarantee for critical workloads along with DRAM like latency and bandwidth.
This blog demonstrates how performance of Oracle Redo Logs can be accelerated using Persistent Memory technology by using Intel Optane DC PMM in App Direct mode backed Oracle 21c Persistent Memory Filestore on VMware vSphere platform.
Yes, you can have your cake and eat it too !!
VMware vSphere Persistent Memory
VMs can be configured to use PMem on a standalone host, or in a cluster. In ESXi you can create VMs that are configured with PMem, and applications inside these VMs can take advantage of this increased speed. Once a VM is initially powered on, PMem is reserved for it regardless of whether it is powered on or off. This PMem stays reserved until the VM is migrated or removed.
Persistent memory can be consumed by virtual machines in two different modes. Legacy guest OSes can still take advantage of virtual persistent memory disk feature.
- Virtual Persistent Memory (vPMem) – Using vPMem, the memory is exposed to a guest OS as a virtual NVDIMM. This enables the guest OS to use PMem in byte addressable random mode.
- Virtual Persistent Memory Disk (vPMemDisk) – Using vPMemDisk, the memory can be accessed by the guest OS as a virtual SCSI device, but the virtual disk is stored in a PMem datastore.
More information on VMware vSphere Persistent Memory can be found here.
Intel Optane DC Persistent Memory
Intel® Optane DC Persistent Memory (DCPMM) memory is an emerging technology where non-volatile media is placed onto a Dual In-Line Memory Module (DIMM) and installed on the memory bus, traditionally used only for volatile memory. Intel’s DCPMM is the implementation of 3D Xpoint which coexists with DRAM on the memory bus. It can be used both as DRAM alternative but can also be used as a high performance and low latency storage device.
The Intel Optane persistent memory has 3 operating modes: Memory Mode ,App Direct Mode and Mixed Memory Mode.
More information on Intel Optane DC Persistent Memory can be found here.
8-Byte Atomicity
Persistent Memory (such as Intel Optane DC Persistent Memory) natively operates byte-by-byte rather than in blocks of data like conventional storage. Data is persisted in chunks of 8-bytes at most in Persistent Memory (again, using the default behavior).
For applications (like databases) that are based on a BLOCK construct, the 8-byte atomicity of Persistent Memory can be an issue. Writing 8,192 bytes of data (an 8K block) will get persisted in 1,024 chunks of 8-bytes each. Power failure or other abnormal condition can leave blocks of data “fractured” or “torn” into pieces, with portions of the block containing old data and other portions with new data.
Applications (such as databases) need to be changed to tolerate this sort of fracturing or tearing of blocks. Otherwise, these are effectively corrupted blocks of data.
More information on the 8-byte atomicity can be found here.
Oracle Persistent Memory Filestore
The Persistent Memory Database feature enables you to place database files in non-volatile memory. This feature supports a single-instance Oracle Database instance on PMEM Filestore.
PMEM Filestore is a pointer-switching PMEM file system that supports atomic updates of Oracle Database data blocks. PMEM Filestore implements the Filesystem in Userspace (FUSE) protocol, enabling Oracle DBAs to perform normal file-level maintenance. FUSE allows non-privileged (non-root) users, such as the Oracle Database software owner, to create and manage filesystems as well as the directories and files contained within them
Some of the benefits of using Persistent Memory Database:
- The PMEM Filestore provides atomic writes to full Oracle database blocks. This eliminates the need for media recovery due to partially written blocks after a power outage.
- Persistent Memory Database performs I/O to PMEM storage via memory copy. This is much faster than performing I/O via traditional operating system calls.
- Database queries save the traditional read from storage and memory copy into buffer cache because the Oracle Database server accesses data directly from persistent memory.
More information on Oracle 21c Persistent Memory Filestore can be found here.
Test Use case
This blog demonstrates how performance of Oracle Redo Logs can be accelerated using Persistent Memory technology by using Intel Optane DC PMM in App Direct mode backed Oracle 21c Persistent Memory Filestore on VMware vSphere platform.
Test Bed
Details of the VMware ESXi server is shown as below. An ESXi server Version 7.0.3, 18644231 server was deployed with 4 sockets , 24 cores per socket , 1.5 TB DRAM and 3TB Persistent Memory.
VM ‘Oracle21C-OL8-Customer’ was created created with 24 vCPU’s and 256GB memory with storage on All-Flash array with Oracle SGA & PGA set to 96G and 20G respectively.
A single instance database ‘ORA21C’ with multi-tenant option was provisioned with Oracle Grid Infrastructure (ASM) and Database version 21.5 (Database Release Update : 21.5.0.0.220118) on O/S OEL 8.5 UEK.
Oracle ASM was the storage platform with Oracle ASMLIB for device persistence.
The vmdk’s for the VM ‘Oracle21C-OL8-Customer’ are shown as below –
- Hard Disk 1 (SCSI 0:0) – 80G for OS (/)
- Hard Disk 2 (SCSI 0:1) – 80G for Oracle Grid Infrastructure and RDBMS binaries
- Hard Disk 3 (SCSI 1:0) – 100G for Grid ASM (GRID_DG ASM Disk Group)
- Hard Disk 4 (SCSI 1:1) – 200G for Oracle Database (DATA_DG ASM Diskgroup)
- Hard Disk 5 (SCSI 2:0) – 1TB SLOB Tablespace (SLOB_DG ASM Diskgroup)
- Hard Disk 6 (SCSI 3:0) – 100G for Redo logs (REDO_DG ASM Diskgroup)
Details of the vmdk’s are shown as below
The ASM diskgroups are shown as below.
[root@oracle21c-ol8 ~]# oracleasm scandisks ; oracleasm listdisks
Reloading disk partitions ; done
Cleaning any stale ASM disks…
Scanning system for ASM disks…
DATA_DISK01
GRID_DISK01
REDO_DISK01
SLOB_DISK01
[root@oracle21c-ol8 ~]#
In addition, 1 NVDIMM of 100G capacity is added to the VM, for use as PMEM in fdax mode in the GOS, as shown below. NVDIMM2 of 120G capacity was not used in this use case.
ndctl command listing shows the PMEM device in a fsdax mode
[root@oracle21c-ol8 ~]# ndctl list
[
{
“dev”:”namespace0.0″,
“mode”:”fsdax”,
“map”:”dev”,
“size”:105694363648,
“uuid”:”8ac8e82a-e1ff-48c9-8d35-51afc8642c39″,
“sector_size”:512,
“align”:2097152,
“blockdev”:”pmem0″
}
]
[root@oracle21c-ol8 ~]#
A partition is created on the PMEM device (pmem0) as shown below for creating a EXT4 dax mounted filesystem
[root@oracle21c-ol8 ~]# fdisk -lu /dev/pmem0
Disk /dev/pmem0: 98.4 GiB, 105694363648 bytes, 206434304 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0xead4960e
Device Boot Start End Sectors Size Id Type
/dev/pmem0p1 4096 206434303 206430208 98.4G 83 Linux
[root@oracle21c-ol8 ~]#
[root@oracle21c-ol8 ~]# mount | grep -i pmem0
/dev/pmem0p1 on /redo_pmem_dax type ext4 (rw,noatime,dax)
[root@oracle21c-ol8 ~]#
For Oracle Persistent Memory Filestore, the pre-requisite is to create a mount point for the file store. The final subdirectory name must be the same as the PMEM filestore name.
In this example , we created a directory ‘orapmem’ within a directory ‘oracle_pmem’ and both owned by the Oracle user.
[root@oracle21c-ol8 /]# ls -l oracle_pmem
drwxrwxrwx 2 oracle dba 6 Jan 10 10:26 oracle_pmem
oracle@oracle21c-ol8:ORA21C:/home/oracle> ls -l /oracle_pmem
drwxrwxrwx 2 oracle dba 6 Jan 10 10:53 orapmem
More details on Oracle Persistent Memory Filestore can be found here.
Test Case
SLOB 2.5.4.0 was chosen as the load generator for this exercise with following SLOB parameters set as below:
UPDATE_PCT=100
SCALE=90G
WORK_UNIT=3
REDO_STRESS=HEAVY
We deliberately chose the minimum Work Unit size to drive the most amount of IO with heavy stress on redo to study the performance improvement using PMEM technology
Test Results
Remember, any performance data is a result of the combination of
- hardware configuration
- software configuration
- test methodology & test tool
- workload profile used in the testing
So, the performance metrics / improvements I got with my workload in my lab is in no way representative of any real production workload , which means the performance improvements for real world workloads will be better. Also this was by no means a benchmarking exercise.
We ran the SLOB load generator with Redo Logs on All-Flash Array v/s Redo logs on Oracle Persistent Memory Filestore and compare the 2 run database metrics.
We can see , by using Redo logs on Oracle PMEM Filestore v/s Redo logs on AFA
- Read IO Requests have increased 8.6 per second (Redo on AFA) to 11.2 per second (Redo on Oracle PMEM Filestore)
- Write IO Requests have increased from 61,901.6 per second (Redo on AFA) to 75,801.3 per second (Redo on Oracle PMEM Filestore)
- Executes(SQL) have increased from 29,114.9 per second (Redo on AFA) to 36,353.9 per second (Redo on Oracle PMEM Filestore)
- Transactions have increased from 28,935.4 per second (Redo on AFA) to 36,165.4 per second (Redo on Oracle PMEM Filestore)
From a wait event perspective
- ‘log file switch completion’ reduced from 24.37% (Redo on AFA) to 22.34% (Redo on Oracle PMEM Filestore)
- Average wait time reduction from 477.70ms (Redo on AFA) to 333.82ms (Redo on Oracle PMEM Filestore)
While this may seem marginal at this point , remember , the performance improvement I got with my workload in my lab is in no way representative of any real production workload which means the performance improvements for real world workloads will be better.
From a redo log performance perspective
- ‘redo writes‘ increased from 26,974 (Redo on AFA ) to 82,605 (Redo on Oracle PMEM Filestore) – indicative of IO load
From a physical read/write perspective
- ‘physical write total IO requests‘ increased from 19,865,229 (Redo on AFA) 24,699,454 (Redo on Oracle PMEM Filestore)
From an OS perspective
- overall database ‘%IO wait time’ has reduced from 20.67% (Redo on AFA) to 12.8% (Redo on Oracle PMEM Filestore)
and yes, we could have tuned more , reduced wait events, achieve better performance but again, as mentioned earlier, this was by no means a benchmarking exercise , this exercise was to demonstrate the advantage of using Intel Optane DC PMM to back Oracle PMEM Filestore on VMware platform.
Summary
In summary , we can see
- Read IO Requests have increased 8.6 per second (Redo on AFA) to 11.2 per second (Redo on Oracle PMEM Filestore)
- Write IO Requests have increased from 61,901.6 per second (Redo on AFA) to 75,801.3 per second (Redo on Oracle PMEM Filestore)
- Executes(SQL) have increased from 29,114.9 per second (Redo on AFA) to 36,353.9 per second (Redo on Oracle PMEM Filestore)
- Transactions have increased from 28,935.4 per second (Redo on AFA) to 36,165.4 per second (Redo on Oracle PMEM Filestore)
- ‘log file switch completion reduced from 24.37% (Redo on AFA) to 22.34% (Redo on Oracle PMEM Filestore)
- Average wait time reduction from 477.70ms (Redo on AFA) to 333.82ms (Redo on Oracle PMEM Filestore)
- While this may seem marginal at this point , remember , the performance improvement I got with my workload in my lab is in no way representative of any real production workload which means the performance improvements for real world workloads will be better
- redo writes increased from 26,974 (Redo on AFA ) to 82,605 (Redo on Oracle PMEM Filestore)
- physical write total IO requests increased from 19,865,229 (Redo on AFA) 24,699,454 (Redo on Oracle PMEM Filestore)
- overall database %IO wait time has reduced from 20.67% (Redo on AFA) to 12.8% (Redo on Oracle PMEM Filestore)
Conclusion
- This blog is meant to raise awareness of how performance of Oracle Redo Logs can be accelerated using Persistent Memory technology by using Intel Optane DC PMM in App Direct mode backed Oracle 21c Persistent Memory Filestore on VMware vSphere platform
- This blog contains results that I got in my lab running a load generator SLOB against my workload, which will be way different than any real-world customer workload, your mileage may vary
- Remember, any performance data is a result of the combination of hardware configuration, software configuration, test methodology & test tool and workload profile used in the testing.
- Also, this was by no means a benchmarking exercise.
Acknowledgements
I would like to thank the following people for their timely help with their guidance / lab equipment in this exercise
- Oracle – Anil Nair
- Intel – Vinodhkumar Raghunathan and Sridhar Kayathi
All Oracle on vSphere white papers including Oracle licensing on vSphere/vSAN, Oracle best practices, RAC deployment guides, workload characterization guide can be found in the url below
Oracle on VMware Collateral – One Stop Shop
https://blogs.vmware.com/apps/2017/01/oracle-vmware-collateral-one-stop-shop.html