VMware vSphere Data Protection Advanced (VDPA) was announced in February and the bits just became generally available last week. One of the issues corrected with the release of VDPA was quiescing with Microsoft Volume Shadow Copy Service (VSS) on Windows Server 2008 R2 – see this VMware KB article for details.
Recent versions of Oracle Database running on Windows can be quiesced by VSS for backup. This had me thinking about the possibility of backing up Oracle Database on Windows with VDPA using VSS to quiesce the database. I realize the majority of organizations run Oracle Database on operating systems other than Windows, but for those that do, I encourage you to continue reading. The results of my testing indicated VSS in VMware Tools can quiesce an Oracle database in a Windows virtual machine for backup.
For the testing environment, I set up a Windows Server 2008 R2 virtual machine with all of the latest Windows patches. I installed Oracle Database 11g R2 Enterprise Edition. One of the components installed with the database software was the Oracle VSS writer. Once the installation was complete, I started doing some homework. I found this article in Oracle’s documentation: Performing Database Backup and Recovery with VSS
One of the first things I did was check the status of the Oracle VSS writer. I opened a Windows command prompt and ran this command: vssadmin list writers. For more information on this command, see this Technet article. I found that the Oracle VSS writer was not listed. I checked the Oracle VSS writer service and it was set to Startup Type: Manual. I started the service and ran the vssadmin list writers command again. This time, it showed up in the list. Furthermore, I noted the State was “Stable” and Last Error showed “No error”.
I opened up the vSphere Web Client, connected to my VDPA appliance, and created a backup job for the virtual machine (VM) running Windows and Oracle Database. Then, I ran the backup job manually. This took a fair amount of time since it was the first time the backup job was run, i.e., a full backup. To pass the time, I wandered off to the kitchen, refilled my coffee cup, and thought about how cool it would be if VDPA had an agent for Oracle Database like it does for Microsoft Exchange and SQL Server. Hmmmm…
Once the backup job was finished, I opened a remote desktop session to the Windows VM running Oracle Database to look for evidence that the database was quiesced by the VSS requester in VMware Tools. I looked at the Application log in Windows Event Viewer and found the following error.
Naturally, I followed the recommendation in the Action section of the error message – I opened Oracle SQL Plus, issued the SHUTDOWN command for the database, and then restarted the database. I also checked the status or the Oracle VSS writer.
It was in a failed state, as well, so I restarted the Oracle VSS writer service. After researching a bit more, I suspected the errors were a result of the database being in NOARCHIVELOG mode. I issued the following SQL statement in SQL Plus: SELECT NAME,LOG_MODE FROM SYS.V$DATABASE; which showed that was indeed the case.
According to the Oracle documentation I mentioned earlier, an error will be produced if the Oracle VSS writer attempts to produce a volume-based shadow copy of a NOARCHIVELOG database that is open in read-write mode. I changed the database to ARCHIVELOG mode. Be sure to read this documentation (for version 11g) for more details on changing the archiving mode.
Finally, I tried the backup job again. Looking at the Application log in the Windows Event Viewer, I found this entry for the Oracle VSS writer:
I also located the log snippet below in the Oracle Database alert log.
Mon Mar 18 16:29:43 2013
Thread 1 advanced to log sequence 81 (LGWR switch)
Current log# 3 seq# 81 mem# 0: C:APPADMINISTRATORORADATAORACLEDBREDO03.LOG
Archived Log entry 7 added for thread 1 sequence 80 ID 0xcc4ebd97 dest 1:
ALTER TABLESPACE SYSTEM BEGIN BACKUP
Completed: ALTER TABLESPACE SYSTEM BEGIN BACKUP
ALTER TABLESPACE SYSAUX BEGIN BACKUP
Completed: ALTER TABLESPACE SYSAUX BEGIN BACKUP
ALTER TABLESPACE UNDOTBS1 BEGIN BACKUP
Completed: ALTER TABLESPACE UNDOTBS1 BEGIN BACKUP
ALTER TABLESPACE USERS BEGIN BACKUP
Completed: ALTER TABLESPACE USERS BEGIN BACKUP
ALTER TABLESPACE SYSTEM END BACKUP
Mon Mar 18 16:29:54 2013
Completed: ALTER TABLESPACE SYSTEM END BACKUP
ALTER TABLESPACE SYSAUX END BACKUP
Completed: ALTER TABLESPACE SYSAUX END BACKUP
ALTER TABLESPACE UNDOTBS1 END BACKUP
Completed: ALTER TABLESPACE UNDOTBS1 END BACKUP
ALTER TABLESPACE USERS END BACKUP
Completed: ALTER TABLESPACE USERS END BACKUP
ALTER SYSTEM ARCHIVE LOG
Mon Mar 18 16:30:02 2013
Last, but not least, some key entries below from the VM log (note that times below are UTC versus EDT above – thus the reason for the 4-hour difference, i.e., focus on the minutes and seconds):
2013-03-18T20:29:48.713Z| vcpu-0| I120: ToolsBackup: changing quiesce state: STARTED -> COMMITTED
(VM is quiesced)
2013-03-18T20:29:48.798Z| vcpu-0| I120: SnapshotVMXTakeSnapshotCB: Enter checkpoint status 1, mode 0.
2013-03-18T20:29:48.798Z| vcpu-0| I120: Closing all the disks of the VM.
(VM snapshot process has started)
We can see from the times of the log entries that the Oracle VSS writer prepared the database for backup (BEGIN BACKUP), the virtual machine snapshot was taken, the database was taken out of backup mode (END BACKUP) and the online redo logs were archived (ALTER SYSTEM ARCHIVE LOG). VDPA went about its business of backing up the VM. After the backup completed a few minutes later, the VM snapshot was consolidated into the original vmdk file. I also checked to see if the Oracle VSS writer experienced that transient error we saw earlier. This time, the vssadmin command reported no issues.
Now before we get too confident, let’s remember that absolute proof of a successful backup comes only through performing a successful restore from that backup. With that in mind, I used VDPA to perform a Restore Rehearsal. The VM restored without any issues. I opened SQL Plus and used the following SQL statement to check the backup status of the tablespaces:
SELECT t.name, d.file# as, b.status
FROM V$DATAFILE d, V$TABLESPACE t, V$BACKUP b
WHERE d.TS#=t.TS# AND b.FILE#=d.FILE#;
I found the backup status of the tablespaces was ACTIVE, i.e. they were in backup mode, which was expected since the VM snapshot for the VDPA backup was taken right after the Oracle VSS writer put the tablespaces in backup mode. I also issued the following SQL statement to see the status of the database:
SELECT instance_name,status FROM v$instance;
The database was mounted, but not open. I opened the database and issued ALTER DATABASE END BACKUP; which changed the backup status of the tablespaces to NOT ACTIVE. At this point, the Oracle database server was restored and functional. Granted, there were some manual steps, but I suspect there are ways to automate some of these. For example, a few scheduled tasks could be set up in Windows to start and stop the Oracle VSS Writer service, as needed (before and after the VDPA backup window).
In summary, this appears to be a basic, but viable option for backing up Windows-based Oracle database servers with VDPA. I am sure some individuals are wondering about VDP (not Advanced). Once the issue in VMware KB article 2035736 is corrected in VDP (very soon!), this should work.
I will be the first to admit I am not an Oracle DBA. I am very interested to hear from individuals who set this up in the “real world” with larger, busy Oracle databases to see how things work and perform. If you do try this, please post your results in the comments section.
I also realize there are other methods. Maybe the use of RMAN in combination with VDPA would produce a viable solution for Oracle database backup and recovery on both Windows and Linux VMs. Again, for those with more experience and knowledge in the Oracle database backup and recovery space, please chime in.