The largest companies in the world run their most important apps on Cloud Foundry. Their stories were on display at Cloud Foundry Summit last month.
Operators have a range of approaches for ensuring they can recover Cloud Foundry, apps, and data in case of a disaster. The approaches fall into two categories: backing up the raw data or automating recreation of the data, and both have associated issues and complexity.
We set out to change that. Our recommended solution for the community: BOSH Backup and Restore, now a beta in Pivotal Cloud Foundry 1.11.
BOSH Backup and Restore (BBR) is going to be the way to backup and restore distributed systems on any cloud #cfsummit #cloudfoundry
— Alex Ley (@AlexEvade) June 15, 2017
Burning Down the House: How to Deal with Disaster Recovery in Cloud Foundry
What is BOSH Backup & Restore (BBR)?
Our ideal solution needed to answer two questions. First, how do you take a distributed backup that’s consistent? And second, how do you avoid modifying your backup scripts every time Cloud Foundry changes?
BBR does that by defining a contract between the backup orchestrator and the component to be backed up. The orchestrator calls scripts on the components to be backed up and restored, and the components are responsible for generating the backup and restoring the backup.
The orchestrator is the BBR binary, and the component is a BOSH job.
To enable consistency, the component (a BOSH release) can implement locking – a pre-backup-lock script and a post-backup-unlock script. Backups are triggered per BOSH deployment, with each pre-backup-lock script on all the jobs in the deployment getting called before each backup script is called.
The BOSH Backup and Restore script execution sequence. Note that the order of calling a particular type of script (e.g. pre-backup-check) is not guaranteed across instance groups. and instances within a group (e.g. foo/job1 may run before foo/job2). Also, the terminology in this diagram follows BOSH 2.0 conventions.
The authors of the component write and maintain the backup and restore scripts for that component, and the scripts are packaged with the component. As a result, scripts can stay in sync with the component, avoiding compatibility issues. And the scripts can be smart, only backing up / restoring required data and, if necessary, performing processing like encryption or credential generation.
There’s a lot of BOSH here. Isn’t this supposed to be for Cloud Foundry? Well, yes! Consider that:
-
All components in Cloud Foundry are BOSH deployments
-
The BOSH director is a BOSH release
-
For an operator, a BOSH deployment is the logical unit of backup
This our rationale for BOSH Backup and Restore. The key to making this all work: the responsibility for writing and maintaining backup and restore scripts sits with the BOSH release author.
We gave a talk at CF Summit 2017 on BOSH Backup and Restore. Check it out if you’d like to hear more about the service, and how we got to this point.
We’ve also proposed BOSH Backup and Restore as an open-source Cloud Foundry extension! We’ll keep you posted on our progress.
How It Works: First, Create The Backup Artifact. Then, Put It Back.
To understand how BBR works, let’s look at the steps that happen once an operator initiates a backup. The BBR binary is run from a jumpbox that has access to PCF deployments. The operator triggers a backup for a BOSH deployment (or director) using the cli. The BBR binary then looks at the jobs in the deployment (or director) for lock / backup / unlock scripts. The binary then triggers those scripts in the prescribed order. The backup artifacts are transferred to the jumpbox. The operator proceeds to transfer the artifacts to external storage.
A restore is the inverse of this process – the backup artifact must be copied into the jumpbox where the BBR binary is located. Then the restore process is triggered by the operator using the cli, specifying the deployment or director to restore, as well as the path to the backup artifact. The BBR binary identifies which jobs implement the restore script, copies the matching backup artifact into the job, and triggers the restore script.
Beta Testers Wanted!
We’re full-speed ahead on a GA release. To help us get there quickly, sign-up to beta test the product! BBR does require Pivotal Cloud Foundry 1.11; it also supports a subset of modules today (CredHub, UAA, the BOSH Director, Elastic Runtime in Pivotal Cloud Foundry). Support for open-source Cloud Foundry and data services are on the roadmap.
One other note: backup & restore needs an ecosystem. So we’re building one! BOSH Backup and Restore solves the core problem of creating a backup artifact, then putting it back. We are leaning on third-parties to solve encryption, scheduling, permissions, and secondary backup sites. Watch this space!