bosh cloud_native operations Pivotal Cloud Foundry

Cloud-Native Operations in 2 Parts. Super Suit Not Required

Operations teams are the unsung superheroes of the software economy.

They design, procure, install and configure the infrastructure. On day 1 they'll deploy and test the apps on this infrastructure. Day 2 onwards they provide 24×7 on-call support for the apps in production, fire-fighting when the lights blink red.

Epic stuff.

Yet in the eyes of the CEO, IT operations is still regarded as a cost center to be shrunk, whose advice is only sought when the system's down or there's a security breach.

Pivotal’s Casey West captures the sentiment in his talk:

“Good Job Patching the Kernel” – No CEO Ever

For developers, cloud-native practices have driven a revolution in their day to day work, made possible by the availability of ubiquitous, cheap computing power – the cloud. Cloud-native development brings together agile development practices, microservice architectures, and continuous integration/deployment. All-in-all, a host of business transforming changes that speed software development, turning it into a key tool for driving business strategy.

CEO, attention grabbed.

Meanwhile, the operations team are expected to deploy and operate the developers’ code, running it securely, reliably and available to customers whenever they demand. It’s no surprise that the only time the CEO wants to hear from operations is when there’s a security breach or a service is crashing or slow. The wrong type of CEO visibility.

There’s a natural tension between developers wanting to go faster and faster versus operations whose only means to stability and predictability is to slam on the developer brakes. Enter self-service cloud resources and developers bypass in-house IT for the path of least resistance, growing the rogue-IT estate. IT is leaving the datacenter; cloud-native is this is the opportunity for the operations team to reclaim it as they evolve into more interesting and strategic roles.

High performing operations teams have innovated in waves to match step with the growing role of software in business, Colin Humphreys, Pivotal’s Cloud CTO thinks it’s time to bring on the cloud-native wave. Based on his 20 years experience of operations, he's seen a common progression to cloud-native maturity:

  • Manual Server configuration
  • Repeatable shell scripting
  • Single server configuration with Puppet/Chef/Ansible    
  • Cloud Native Operations: Day 1, Day 2 and beyond with BOSH

Pivotal’s philosophy for cloud-native operations is based on our agile experience at Pivotal Labs, running Pivotal Web Services at scale, customer and partner best practices including Google’s SRE and a wealth of Pivots’ experience from all corners of the industry.

Part 1: The Cloud-Native Ops Opportunity

Maybe you’ve already optimized your software-development lifecycle and your delivery pipeline? great! But what if your day 2 reality is still full of bespoke scripting, manual intervention, and late-night outages? That’s a sure sign your business is missing out on the benefits of cloud-native operability and your operations team are stuck toiling on pager alerts rather than advancing their career on with more interesting and impactful roles.

In this first installment we’ll explore the ‘why’ of cloud-native operations for CEOs, operations and development teams.

Part 2: Automated Ops; freedom to innovate

The age of the cloud unleashed unprecedented compute scale and, in turn, has made possible many new and innovative solutions. IoT, Big Data and Machine Learning demand server resources orders of magnitude larger than 10-15 years ago, yet established operations practices aren’t ready for this explosion in scale, continuing to treat servers as Pets instead of Cattle.

With containers, you could argue that we also need to extend the analogy to Pets and Cattle to Ants. But the message is clear; we’re now operating at a scale, speed, and complexity level that only computers can deal with. Manual monitoring and intervention is too slow and brings a proportional increase in the probability of failures through human error.

Such a high level of repetition or "toil" is also tedious for teams.  

Cloud-native operations can help recast operations roles to be more interesting and rewarding. Using software to manage servers we can scale massively and direct human interactions at a meta level, with a focus on making incremental improvements and aligning with strategic goals and process improvement.

From a recent McKinsey research article on automation:

As roles and processes get redefined, the economic benefits of automation will extend far beyond labor savings. Particularly in the highest-paid occupations, machines can augment human capabilities to a high degree, and amplify the value of expertise by increasing an individual’s work capacity and freeing the employee to focus on work of higher value.

In this post, I’ll look at how a cloud-native platform automates traditional undifferentiated heavy lifting, repetitive ops tasks and how this transforms the role of the ops team from keeping the lights on to driving business success. I’ll also look at how cloud-native operations open up the possibility of evolving the procedural Infrastructure-as-Code to the declarative Infrastructure-is-Code with BOSH.