For Comic Relief – a UK based charity, fifty minutes of downtime during peak fundraising hours could result in half their annual online donations income being lost. This blog describes how Cloud Foundry was used to provide an Enterprise grade donations platform that helped Comic Relief raise a record £75million (about US $115 million) in funds for their humanitarian projects.
Guest Blog by Colin Humphreys, CloudCredo, Inc.
Comic Relief is a well known charity based in the UK which strives to create a just world free from poverty. They raise millions of pounds through two big fundraising campaigns – Red Nose Day and Sport Relief. The money raised is spent in the best possible way to tackle the root causes of poverty and social injustice. All the money raised by the public is spent by Comic Relief to help poor and disadvantaged people in the UK and the world’s poorest countries.
Seven Hours to Maximise Funding For The Entire Year
A year of planning for Red Nose Day 2013 resulted in a 6 week media campaign culminating in 7 hours of prime time television coverage on the 15th March. Their donations platform is required to process in the region of 600,000 transactions in this 7 hour period, handle in excess of 10,000 concurrent call centre operators, and handle peaks of 500 donations completing per second from web, mobile and call centers.
These requirements pose an operational nightmare. A fifty minute outage at peak traffic, allowed within an annual “four nines”(99.99%) uptime SLA, could result in 50% of the charity’s annual income being lost. Comic Relief require a platform delivered with no outages; there won’t be a second chance for another year. Feedback on the performance of the platform under high load from real users is only received on an annual basis. To demonstrate the difficulty of this challenge, JustGiving, one of the largest charity-orientated service providers, had a huge outage beginning early on Red Nose Day 2013.
For the previous eight years the donations applications have been delivered via a monolithic Java application and RDBMS backend, deployed and scaled with the help of twelve partners. This setup suffered from single points of failure, high recovery times from potential failures, and a lack of consistency between environments in the deployment pipeline. Comic Relief were also looking to remove their reliance on any single third-party supplier so as to not be limited in their technology choices.
Working Prototype Delivered in a Single Day on CloudFoundry.com
Moving from traditional development methodologies to a lean, agile approach delivered by a small team required a leap of faith from Comic Relief. Their fears were allayed by delivering a working prototype via CloudFoundry.com on the first day of application development. This was critical to receiving constant feedback from Comic Relief, and gave a starting point from which to iterate on the applications and the platform.
The software that drives the ability to take donations is a series of small micro-applications decoupled via queuing technologies. The service-layer application generates events that are used to mutate an eventually consistent view of the donations domain, enabling failure handling and fault tolerance across the distributed architecture. The applications were all developed to the platform contract represented by Cloud Foundry.
Vendor Neutrality and High Availability are Primary Architectural Goals
Developing the applications to the Cloud Foundry contract brought vendor neutrality to Comic Relief’s donations deployment. Wherever Cloud Foundry ran the donations suite could run too. This commoditised the infrastructure vendors and freed Comic Relief from vendor-related constraints, allowing infrastructure choices to be based on price, performance and efficiency – rather than forced by vendor lock-in.
Developing on the operational capabilities of Cloud Foundry played a major role in meeting Comic Relief’s requirements. The data services were enhanced with redundancy and replication, intelligent connection load-balancing was added to each instance, and a distributed global load-balancing/fault tolerance service added to recover quickly from single or multiple instance outages. The service was distributed across multiple AWS EC2 regions and zones, and also London-based VMware vCloud/vSphere IaaS provider, Carrenza (See Figure).
Service distributed across multiple regions and zones
Enhancing Cloud Foundry’s capabilities involved the same Continuous Delivery-based patterns employed to develop the donations applications. The platform pipeline was based on Jenkins orchestrating Cloud Foundry BOSH, with automated testing using Cucumber and RSpec, automated load testing using Grinder, and automated security testing with Zed Attack Proxy in tandem with a custom suite. The platform pipeline for Cloud Foundry then formed part of a larger integration pipeline for continuous delivery of the donations service as a whole to production-like environments. Frequent and regular deployments following extensive automated testing generated the confidence required to make any necessary changes to such a mission critical system.
Record Earnings With No Outage
So how did it actually perform on the night? Shortly after 01:30 GMT, comedian Russell Brand announced the money raised stood at £75,107,851 – passing the previous high of £74.3m. The new record was delivered by a platform that suffered absolutely no outage in receiving donations – the first time this has ever been achieved by a donations platform for Red Nose Day.
I’ve been delivering platforms for nearly fifteen years; I can’t recall ever being more excited about a new technology than I was on the day Cloud Foundry was released. I’m exceptionally proud to have taken the promise of Cloud Foundry and delivered real value in a mission-critical scenario.
About the Author: Colin Humphreys is CEO of CloudCredo, a team of people with extensive knowledge in running and customizing Cloud Foundry. He led the installation of the first Cloud Foundry to deliver SLA-driven production services, delivers tooling to the Cloud Foundry community, and is a regular conference speaker on PaaS-related topics. He also organizes the London PaaS User Group.