vSphere Oracle

vMotion of Oracle RAC on VMware with Change Block Tracking (CBT)

Business Critical Oracle Workloads have stringent IO requirements and enabling, sustaining, and ensuring the highest possible performance along with continued application availability is a major goal for all mission critical Oracle applications to meet the demanding business SLA’s, all the way from on-premises to VMware Hybrid Clouds

Oracle RAC provides high availability and scalability by having multiple instances access a single database which prevents the server from being a single point of failure. Oracle RAC enables you to combine smaller commodity servers into a cluster to create scalable environments that support mission critical business applications.

 

 

 

 

 

Key points to take away from this blog

 

This blog showcases the validation of vMotion capability for Oracle RAC cluster on VMware vSphere Platform which can be achieved seamlessly without any issue, with appropriate settings in place

  • with Change Block Tracking (CBT) enabled for non-shared disk
  • with multi-writer attribute set for the shared vmdk’s

 

 

 

Oracle Real Application Clusters (RAC) on VMware vSphere

 

Oracle Clusterware is portable cluster software that provides comprehensive multi-tiered high availability and resource management for consolidated environments. It supports clustering of independent servers so that they cooperate as a single system.

Oracle Clusterware is the integrated foundation for Oracle Real Application Clusters (RAC), and the high-availability and resource-management framework for all applications on any major platform.

There are two key requirements for Oracle RAC:
• Shared storage
• Multicast Layer 2 networking

These requirements are fully addressed when running Oracle RAC on VMware vSphere, as both shared storage and Layer 2 networking are natively supported.

VMware vSphere HA clusters enable a collection of VMware ESXi hosts to work together so that, as a group, they provide higher infrastructure-level availability for VMs than each ESXi host can provide individually. VMware vSphere HA provides high availability for VMs by pooling the VMs and the hosts on which they reside into a cluster. Hosts in the cluster are monitored and in the event of a failure, the VMs on a failed host are restarted on alternate hosts

Oracle RAC and VMware HA solutions are complementary to each other. Running Oracle RAC on a VMware platform provides the application-level HA enabled by Oracle RAC, in addition to the infrastructure-level HA enabled by VMware vSphere.

More information on Oracle RAC on VMware vSphere can be found at Oracle VMware Hybrid Cloud High Availability Guide – REFERENCE ARCHITECTURE

 

 

 

 

VMware multi-writer attribute for shared vmdk’s

 

VMFS is a clustered file system that disables (by default) multiple VMs from opening and writing to the same virtual disk (.vmdk file). This prevents more than one VM from inadvertently accessing the same .vmdk file. The multi-writer option allows VMFS-backed disks to be shared and written to by multiple VMs. An Oracle RAC cluster using shared storage is a typical use case.

By default, the simultaneous multi-writer “protection” is enabled for all. vmdk files ie all VM’s have exclusive access to their vmdk files. So in order for all of the VM’s to access the shared vmdk’s simultaneously, the multi-writer protection needs to be disabled.

KB 1034165 provides more details on how to set the multi-writer option to allow VM’s to share vmdk’s.

In the case of VMware vSphere on VMFS (non vSAN Storage) , vVol (beginning with ESXi 6.5) and NFS datastores, using the multi-writer attribute to share the VMDKs for Oracle RAC requires

  • SCSI bus sharing needs to be set to none
  • VMDKs must be Eager Zero Thick (EZT) ; thick provision lazy zeroed or thin-provisioned formats are not allowed
  • VMDK’s need not be set to Independent persistent

 

The below table describes the various Virtual Machine Disk Modes

 

 

 

While Independent-Persistent disk mode is not a hard requirement to enable Multi-writer option, the default Dependent disk mode would cause the “cannot snapshot shared disk” error when a VM snapshot is taken. Use of Independent-Persistent disk mode would allow taking a snapshot of the OS disk while the shared disk would need to be backed up separately by a third-party vendor software e.g. Oracle RMAN

 

 

 

 

Supported and Unsupported Actions or Features with Multi-Writer Flag

 

 

 

**** Important ***

  • SCSI bus sharing is left at default and not touched at all in case of using shared vmdk’s. – Leave it alone for RAC with shared vmdk’s
  • It’s only used for RAC’s with RDM (Raw Device Mappings) as shared disks.

VMware recommends using shared VMDK (s) with Multi-writer setting for provisioning shared storage for ALL Oracle RAC environments (KB 1034165)

More information on Oracle RAC on VMware vSphere can be found at Oracle VMware Hybrid Cloud High Availability Guide – REFERENCE ARCHITECTURE

 

 

 

 

 

Test Use case

 

This test case showcases the validation of vMotion capability for Oracle RAC cluster on VMware vSphere Platform which can be achieved seamlessly without any issue, with appropriate settings in place

  • with Change Block Tracking (CBT) enabled for non-shared disk
  • with multi-writer attribute set for the shared vmdk’s

The validation is performed for this infrastructure setup below

  • VMware ESXi, 8.0.0 Build 20513097
  • Oracle RAC 19.12 with Grid Infrastructure, ASM Storage and ASMLIB on Oracle Enterprise Linux (OEL) 8.5
  • Pure X50 FlashArray

 

 

 

 

Test Bed

 

The Test bed is a 2 Node vSphere Cluster with 2 ESXi servers, all version ESXi, 8.0.0 Build 20513097.

The server names are ‘sc2esx64.vslab. local’ and ‘sc2esx65.vslab. local’.

The 2 ESXi servers are Super Micro SYS-2049U-TR4 Servers, each server has 4 sockets, 24 cores per socket, Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz with 1.5 B RAM.  The vCenter version was 8.0.0 build 20519528.

 

 

Storage for the Oracle RAC VM’s are provided by the Pure X50 FlashArray (Purity //F 6.3.3). Details of the VMFS datastore backed by Pure Storage is shown as below.

 

 

Oracle RAC VM’s ‘orac19c1’ and ‘0rac19c2’ details are shown as below. Each VM has 8 vCPU’s and 64GB vRAM. RAC database ‘orac19c’ was created with multi-tenant option & provisioned with Oracle Grid Infrastructure (ASM) and Database version 19.15 on O/S OEL 8.5 UEK.

Oracle ASM was the storage platform with Oracle ASMLIB for device persistence. Oracle SGA & PGA set to 32G and 6G, respectively. All Oracle on VMware best practices were followed.

The Public and Private Adapter IP address information is shown as below.

 

 

The vmdk’s for the VM ‘orac19c1’ & ‘orac19c2’ are shown as below –

  • Hard Disk 1 (SCSI 0:0) – 80G for OS (/)
  • Hard Disk 2 (SCSI 0:1) – 80G for Oracle Grid Infrastructure and RDBMS binaries
  • Hard Disk 3 (SCSI 1:0) – 500G for Oracle RAC (GIMR, CRS, VOTE, DATA)

 

Note – We used 1 VMDK for the entire RAC Cluster, as the intention here was – vMotion of Oracle RAC cluster on VMware vSphere Platform with Change Block Tracking (CBT) enabled for non-shared disk can be achieved seamlessly without any issue, with appropriate settings in place

 

 

 

Recommendation is to follow the RAC deployment Guide on VMware for Best Practices with respect to the RAC layout – Oracle VMware Hybrid Cloud High Availability Guide – REFERENCE ARCHITECTURE

The Change Block Tracking (CBT) specific settings for the RAC VM’s is shown as below.

 

The CBT settings for the vmdk’s for VM ‘orac19c1’ & ‘orac19c2’ are shown as below –

  • Hard Disk 1 (SCSI 0:0) – 80G for OS (/)                                                                                            – CBT enabled
  • Hard Disk 2 (SCSI 0:1) – 80G for Oracle Grid Infrastructure and RDBMS binaries                          – CBT enabled
  • Hard Disk 3 (SCSI 1:0) – 500G for Oracle RAC (GIMR, CRS, VOTE, DATA)                                     – CBT disabled

 

In addition, set the CBT disabled at the Global Level (ctkEnabled=FALSE) in both RAC VM’s.

The .vmx file entries for CBT is shown as below for RAAC VM’s ‘orac19c1’ and’orac19c2’

 

[root@sc2esx65:/vmfs/volumes/60ea3b49-f557ea8e-7a1d-e4434b2d2ca8/orac19c1] cat orac19c1.vmx | grep -i ctk
scsi1:0.ctkEnabled = “FALSE”
scsi0:1.ctkEnabled = “TRUE”
scsi0:0.ctkEnabled = “TRUE”
ctkEnabled = “FALSE”
[root@sc2esx65:/vmfs/volumes/60ea3b49-f557ea8e-7a1d-e4434b2d2ca8/orac19c1]

 

[root@sc2esx65:/vmfs/volumes/60ea3b49-f557ea8e-7a1d-e4434b2d2ca8/orac19c2] cat orac19c2.vmx | grep -i ctk
scsi1:0.ctkEnabled = “FALSE”
scsi0:1.ctkEnabled = “TRUE”
scsi0:0.ctkEnabled = “TRUE”
ctkEnabled = “FALSE”
[root@sc2esx65:/vmfs/volumes/60ea3b49-f557ea8e-7a1d-e4434b2d2ca8/orac19c2]

 

 

Test Steps

 

The above section shows the configuration details of the Oracle RAC 19.15 with Grid Infrastructure, ASM Storage and ASMLIB on VMware ESXi, 8.0.0 Build 20513097 using Pure X50 FlashArray.

The O/S details are shown as below

[root@orac19c1 ~]# cat /etc/oracle-release
Oracle Linux Server release 8.5
[root@orac19c1 ~]#

 

[root@orac19c1 ~]# uname -a
Linux orac19c1.vslab.local 5.4.17-2136.302.7.2.1.el8uek.x86_64 #2 SMP Tue Jan 18 12:11:34 PST 2022 x86_64 x86_64 x86_64 GNU/Linux
[root@orac19c1 ~]#

 

The Cluster services are up as shown below.

 

 

We will perform the validation of the vMotion steps for the 2 test cases as shown below.

 

Test Case 1 – with Global ‘ctkEnabled=FALSE’  and keeping individual disks CBT entries the same as shown below

  • vMotion VM ‘orac19c1’ from host ‘sc2esx64’ to host ‘sc2esx65’
  • vMotion VM ‘orac19c2’ from host ‘sc2esx65’ to host ‘sc2esx64’
    • scsi1:0.ctkEnabled = “FALSE”
    • scsi0:1.ctkEnabled = “TRUE”
    • scsi0:0.ctkEnabled = “TRUE”
    • ctkEnabled = “FALSE”

 

 

Test Case 2- without setting Global ‘ctkEnabled=FALSE’ and keeping individual disks CBT entries the same as shown below

  • vMotion VM ‘orac19c1’ from host ‘sc2esx64’ to host ‘sc2esx65’
  • vMotion VM ‘orac19c2’ from host ‘sc2esx65’ to host ‘sc2esx64’
    • scsi1:0.ctkEnabled = “FALSE”
    • scsi0:1.ctkEnabled = “TRUE”
    • scsi0:0.ctkEnabled = “TRUE”

 

Baseline (before vMotion) – The RAC VM’s and their ESXi hosts are shown as below

  • VM ‘orac19c1’ is on ESXi Host ‘sc2esx64’
  • VM ‘orac19c2’ is on ESXi Host ‘sc2esx65’

 

 

 

 

 

 

Test Case 1 – with Global ‘ctkEnabled=FALSE’ and keeping individual disks CBT entries the same as shown below

  • vMotion VM ‘orac19c1’ from host ‘sc2esx64’ to host ‘sc2esx65’
  • vMotion VM ‘orac19c2’ from host ‘sc2esx65’ to host ‘sc2esx64’
    • scsi1:0.ctkEnabled = “FALSE”
    • scsi0:1.ctkEnabled = “TRUE”
    • scsi0:0.ctkEnabled = “TRUE”
    • ctkEnabled = “FALSE”

 

 

 

Select the destination ESXI server as ‘sc2esx65’ and the corresponding VM networks

 

 

Select vMotion priority and start vMotion

 

 

 

vMotion of RAC VM ‘orac19c1’ from host ‘sc2esx64’ to host ‘sc2esx65’ completed successfully.

 

 

 

Search for vMotion event in VM ‘orac19c1’ vmware.log file –

[root@sc2esx65:/vmfs/volumes/60ea3b49-f557ea8e-7a1d-e4434b2d2ca8/orac19c1]
…..
2023-01-26T03:00:44.570Z In(05) vmx – MigrateVMXdrToSpec: type: 1 srcIp=<172.16.32.164> dstIp=<172.16.32.165> mid=9e61492a022a895 uuid=00000000-0000-0000-0000-0cc47aff62fc pr
2023-01-26T03:00:44.570Z In(05) vmx – MigrateVMXdrToSpec: encryptedVMotion: 1
2023-01-26T03:00:44.570Z In(05) vmx – MigrateVMXdrToSpec: type 1 unsharedSwap 0 memMinToTransfer 0 cpuMinToTransfer 0 numDisks 0 numStreamIps 1 numFtStreamIps 0
2023-01-26T03:00:44.570Z In(05) vmx – Received migrate ‘from’ request for mid id 713280210969208981, src ip <172.16.32.164>.
2023-01-26T03:00:44.570Z In(05) vmx – MigrateSetInfo: state=8 srcIp=<172.16.32.164> dstIp=<172.16.32.165> mid=713280210969208981 uuid=00000000-0000-0000-0000-0cc47aff62fc pri
2023-01-26T03:00:44.570Z In(05) vmx – MigrateSetState: Transitioning from state 0 to 8.
….
[root@sc2esx65:/vmfs/volumes/60ea3b49-f557ea8e-7a1d-e4434b2d2ca8/orac19c1]

 

 

 

 

Similar results were obtained when we vMotioned of RAC VM ‘orac19c2’ from host ‘sc2esx65’ to host ‘sc2esx64’ and it completed successfully.

 

 

 

 

 

 

Test Case 2- without setting Global ‘ctkEnabled=FALSE’ and keeping individual disks CBT entries the same as shown below

  • vMotion VM ‘orac19c1’ from host ‘sc2esx64’ to host ‘sc2esx65’
  • vMotion VM ‘orac19c2’ from host ‘sc2esx65’ to host ‘sc2esx64’
    • scsi1:0.ctkEnabled = “FALSE”
    • scsi0:1.ctkEnabled = “TRUE”
    • scsi0:0.ctkEnabled = “TRUE”

 

The Change Block Tracking (CBT) specific settings for the RAC VM’s is shown as below.

 

 

 

The CBT settings for the vmdk’s for VM ‘orac19c1’ & ‘orac19c2’ are shown as below –

  • Hard Disk 1 (SCSI 0:0) – 80G for OS (/)                                                                                            – CBT enabled
  • Hard Disk 2 (SCSI 0:1) – 80G for Oracle Grid Infrastructure and RDBMS binaries                          – CBT enabled
  • Hard Disk 3 (SCSI 1:0) – 500G for Oracle RAC (GIMR, CRS, VOTE, DATA)                                     – CBT disabled

 

The .vmx file entries for CBT is shown as below for RAAC VM’s ‘orac19c1’ and’orac19c2’

 

[root@sc2esx65:/vmfs/volumes/60ea3b49-f557ea8e-7a1d-e4434b2d2ca8/orac19c1] cat orac19c1.vmx | grep -i ctk
scsi1:0.ctkEnabled = “FALSE”
scsi0:1.ctkEnabled = “TRUE”
scsi0:0.ctkEnabled = “TRUE”
[root@sc2esx65:/vmfs/volumes/60ea3b49-f557ea8e-7a1d-e4434b2d2ca8/orac19c1]

 

[root@sc2esx65:/vmfs/volumes/60ea3b49-f557ea8e-7a1d-e4434b2d2ca8/orac19c2] cat orac19c2.vmx | grep -i ctk
scsi1:0.ctkEnabled = “FALSE”
scsi0:1.ctkEnabled = “TRUE”
scsi0:0.ctkEnabled = “TRUE”
[root@sc2esx65:/vmfs/volumes/60ea3b49-f557ea8e-7a1d-e4434b2d2ca8/orac19c2]

 

start vMotion of RAC VM ‘orac19c1’ from ‘sc2esx64’ to ‘sc2esx65’.

 

 

 

 

vMotion of RAC VM ‘orac19c1’ from ‘sc2esx64’ to ‘sc2esx65’ completed successfully without any errors

 

 

Similar results were observed during vMotion of RAC VM ‘orac19c2’ from ‘sc2esx65’ to ‘sc2esx64’ completed successfully without any errors

 

 

 

 

 

Summary

  

This blog showcases the validation of vMotion capability for Oracle RAC cluster on VMware vSphere Platform which can be achieved seamlessly without any issue, with appropriate settings in place

  • with Change Block Tracking (CBT) enabled for non-shared disk
  • with multi-writer attribute set for the shared vmdk’s

The validation is performed for this infrastructure setup below

  • VMware ESXi, 8.0.0 Build 20513097
  • Oracle RAC 19.12 with Grid Infrastructure, ASM Storage and ASMLIB on Oracle Enterprise Linux (OEL) 8.5
  • Pure X50 FlashArray

 We performed the validation of the vMotion steps for 2 test cases and observed similar results –

  • Test Case 1 – with Global ‘ctkEnabled=FALSE’ and keeping individual disks CBT entries
  • Test Case 2- without setting Global ‘ctkEnabled=FALSE’ and keeping individual disks CBT entries

 

 

 

Acknowledgements

 

This blog was authored by Sudhir Balasubramanian, Senior Staff Solution Architect & Global Oracle Lead – VMware.

 

 

Conclusion

 

Business Critical Oracle Workloads have stringent IO requirements and enabling, sustaining, and ensuring the highest possible performance along with continued application availability is a major goal for all mission critical Oracle applications to meet the demanding business SLA’s, all the way from on-premises to VMware Hybrid Clouds

Oracle RAC provides high availability and scalability by having multiple instances access a single database which prevents the server from being a single point of failure. Oracle RAC enables you to combine smaller commodity servers into a cluster to create scalable environments that support mission critical business applications.

All Oracle on VMware vSphere collaterals can be found in the url below

Oracle on VMware Collateral – One Stop Shop
https://blogs.vmware.com/apps/2017/01/oracle-vmware-collateral-one-stop-shop.html