In this article we investigate the performance of an SAP NetWeaver system with Persistent Memory (PMEM) on vSphere 6.7 (SAP NetWeaver is supported on vSphere 6.7). vSphere 6.7 introduces support for PMEM which is a storage class that resides between DRAM and disk storage in the data storage hierarchy. It includes advantages of DRAM-like latency and bandwidth and persistence of data after power cycles. PMEM is based on NVDIMMs (non-volatile dual in-line memory module) which are installed on NVDIMM enabled servers via DDR4 DIMM sockets. For background on PMEM please read this blog and this performance whitepaper.
To demonstrate the performance impact of PMEM with an SAP NetWeaver system we conducted experiments based on:
- SAP NetWeaver system running on: Oracle database; Linux Guest OS
- vSphere 6.7.0, 8169922
- Micron NVDIMM configured on NVDIMM enabled servers.
- Oracle Redo logs deployed on PMEM. The Oracle data files are on traditional SAN. PMEM is exposed to the virtual machine (VM) in two ways:
- vPMEMDisk – vSphere presents PMEM as a regular disk attached to the VM. No guest OS or application change is needed to leverage this mode.
- vPMEM – vSphere presents PMEM as a NVDIMM device to the VM and requires a more recent Guest OS version that supports NVDIMM. NVDIMM is exposed to the application as block or byte-addressable devices. In the tests here, we use vPMEM as a block-based storage device by going through the thin layer of the direct-access (DAX) file system.
The following vSphere client screenshots show the availability of PMEM in both the modes vPMEMDisk and vPMEM during the “Edit settings” option of a VM.
vPMEMDisk vPMEM
Next, we cover some background on the database and the SAP workload used in the PMEM tests.
Oracle Redo Logs
Some background on the Oracle redo log files will help to explain why we prioritize this component on PMEM. In the Oracle RDBMS environment, redo logs comprise files in a proprietary format which log a history of all changes made to the database. Its critical for database operations and recovery in case of crashes. A log writer process (LGWR) writes the redo log changes to disk and the I/O speed of this component is important for overall database performance. For more details on Oracle I/O performance in SAP systems see SAP note 793113 – FAQ: Oracle I/O configuration (user / password required).
The storage space requirements of the redo log files are only a fraction of the complete database thus reducing the demand on PMEM backed storage.
SAP Workload
For this investigation we used an SAP workload that was easy to setup and run repeatedly and put considerable load on the database. It is an SAP administration task called Client Copy and Client Delete. It’s an activity that is typically run in non-production systems to create different business environments inside of the same database schema. You can also delete the client. These tasks generate a lot of table row changes (inserts and deletes) which in turn generates redo log activity. The amount of activity is dependent on the amount of business data stored in the client. For information on SAP Clients see here.
The Client Copy and Delete were run in local mode (on the same SAP system) and selected to run as a batch job. During baseline testing we analyzed the workload of the Client Copy using SAP analysis tools available in SAP transaction STAD.
The following SAP screenshot shows the workload analysis of the Client Copy task.
The above analysis shows that the overall response time/ duration of the Client Copy task is based on ~34% of time spent processing in the application server tier and 66% in the database. Within the database component part of the time is attributable to writing the redo logs to disk. We show this here to set expectations that increasing the I/O performance of the storage system with PMEM will have some positive impact on this SAP workload, but large parts of the processing time are less dependent on storage I/O latency e.g. the application tier.
During the tests the following performance data was captured:
- Duration of Client Copy and Delete run in batch – available in SAP transaction SM37.
- Average Oracle Redo Write Time – available in SAP transaction ST04 .
- Oracle Wait Events – available in SAP transaction ST04.
Oracle Wait Events are a set of statistics that can help analyze Oracle performance and is an important part of Oracle tuning – the subject is covered in SAP Note 619188 – FAQ: Oracle wait events .
Results
The following table shows results for an SAP on Oracle system installed on an older Linux OS version which does not support NVDIMM. In the first baseline run the database data and redo files were deployed on SAN. In the next set of runs the redo logs were deployed on vPMEMDisk. The test was repeated with two different client sizes. The larger client had more business data/transactions.
The following table shows results for an SAP on Oracle system installed on a newer Linux OS version which supports NVDIMM. In the first baseline run the database data and redo files were deployed on SAN. In the next set of runs the redo logs were deployed on vPMEMDisk and then on VPMEM.
The results above show between 7-10% increase in performance (lower run time) of the SAP batch jobs with PMEM. Where we see huge gains is in the average redo write time and the “log file sync” wait event. “log file sync” is the waiting until the log writer process has written all data from the redo buffer to the redo log files. Typically, increased values for this event are related to I/O problems. As can be seen in the results this wait event has been significantly reduced by vPMEM.
The following results show the top Oracle wait events before and after deploying the redo logs on vPMEM (source: SAP transaction ST04).
Data + Redo on SAN Data on SAN, Redo on vPMEM
The results show that the highest wait event in Oracle is “CPU used by this session”. This indicates Oracle was spending most of the time processing SQL statements (the VM CPU utilization was not saturated i.e. no CPU bottleneck). Deploying redo logs on vPMEM has no impact on this database wait event as its dependent upon the SQL code operations – hence there is a limit to which PMEM can impact the overall duration of the Client Copy and Delete tasks (as mentioned earlier). It should be noted that there may be other opportunities in this example to tune Oracle to increase performance further – this would follow typical Oracle and SAP on Oracle tuning practices and is independent of PMEM.
Summary
- Deploying the Oracle database redo logs on PMEM backed storage on VMware provides considerable performance improvement / reduced I/O latency of the redo log service.
- There are gains in the overall performance of the SAP batch job because of the faster redo log service (SAP Client Copy and Delete being used in this example).
- The degree of performance gain to the SAP application depends on the workload hence mileage will vary.
- The performance gain delta from the baseline deployed on SAN will depend on the degree of Oracle tuning that has been applied to the SAP on Oracle system. In the example here, further Oracle tuning is possible independent of PMEM that would impact performance.
- In this test the maximum redo log I/O performance was achieved with the vPMEM mode – DAX mounted filesystem backed by NVDIMM directly exposed to the Guest OS. The NVDIMM is exposed as a block device. Even faster access would be NVDIMM exposed as a byte addressable device – this option requires changes to the database application. To see an example of the impact of byte addressable device access with a database see this performance whitepaper.
- For more details of Oracle Database performance with VMware and PMEM see this whitepaper.
Thanks to my colleagues for their reviews and inputs: Sudhir Balasubramanian; Todd Muirhead; Oleg Ulyanov; Mohan Potheri.