Uncategorized

Automated Ops; Freedom To Innovate (Part 2)

This is the second installment of a blog post on Cloud-Native Operations – Read part 1

Developers are familiar with the concept of platform abstraction, offering them power while hiding complexity, making them more productive.

But on the operations side of the house, the idea and practice of abstraction has been slow to gain traction – until now. Automation in a cloud-native platform provides this abstraction, making ops more productive and more valuable to the organization. As operations teams move up the stack, they experience a force-multiplier effect as automation and autonomous systems enable them to expand the scope of workloads they’re able to manage.  At Pivotal, our hosted Cloud Foundry offering supports tens of thousands of containerized applications and thousands of developers with a ops team of 6 staff.

Operations Teams as Product Teams

Cloud-native operations transforms ops teams into product owners, serving their developer customers. Automation across API-driven infrastructure means developers now have agile, self-service access to deploy and operate their apps, without the headache of understanding the underlying resources and writing extra code to protect from failures.

We often call the operation team's product "Developer Ready Infrastructure"; meeting developers at the highest level of abstraction and enabling them to focus on writing & deploying code that drives business goals, without regard for infrastructure concerns. 

Also, by moving up the value line and eliminating many manual infrastructure tasks, operations teams efforts now accrue more directly to business success. In the eyes of developers and the business, they're transformed from a dragging cost center to strategic enablers of software velocity and market agility.

Operations abstraction brings other benefits. Dropping dependency on a specific IaaS also enables increased flexibility and choice of cloud, so-called "multi-cloud" operation. Workloads can be migrated from on-premises to cloud (and between clouds) without re-skilling operations teams. Security and compliance can be built-into the platform with hardened OS images, audit trails and the 3 R’s automatically forming part of the deployment. Advanced virtualized infrastructure such as VMware’s NSX helps to automate and enforce many compliance and security assurance tasks.

The relationship with DevOps and between Developers, Operations and their cloud-native platform are actually more complex than I've laid out here.  I'll cover this in detail in a dedicated post in this series: "Cloud-Native Operations and the Developer Dance".

BOSH (is) All the Things!

At Pivotal, our Cloud Foundry solution relies on BOSH for infrastructure automation.

BOSH is a proven-at-scale, open source release engineering tool that recently celebrated it’s 5th birthday, summarized in an excellent post by BOSH expert Dr Nic Williams.

It’s no fun to write complex open source that no one can deploy on Day 1 — nor continue to live with on Day 2. BOSH has been helping solve this for five years!

– Dr. Nic Williams, Stark & Wayne

But I already have configuration management…

Tools such as Chef, Puppet and Salt Stack are valuable DevOps tools for configuring new environments. However, the full lifecycle of versioning, packaging, and deploying software in a reproducible fashion often results in a custom integration of these and a variety of other tools. For example, provisioning tools such as Terraform, or AWS’s Cloud Formation that don't abstract away infrastructure specific configuration. Not only must operations maintain an expanding number of integrations but the lack of abstraction causes infrastructure concerns to bubble up for the end user or unchecked configuration drift. The upside-down economics of building your own platform in this way are explained in a recent whitepaper.

Enter BOSH – an integrated toolchain covering the end-to-end set of requirements for full lifecycle software release engineering. BOSH provides a mechanism to version, package and deploy distributed software services, enabling them to be reproduced and scale regardless of IaaS provider. 

But the value of BOSH extends beyond release engineering, providing a seamless experience delivering software and then ensuring it remains highly available and resilient. Simply put, BOSH translates declarative code into infrastructure and then proactively maintains the initial state for the running service. The post-deployment or “Day 2” capabilities of BOSH ensure that deployments continue to operate in a consistent fashion, automatically compensating for any adverse events or configuration drift.

Automating DevOps with an integrated platform

Release Engineering handles the compilation, assembly, delivery and update of source code into finished software running as services. It turns out that when you automate release engineering end-to-end, you also create an ideal platform to support DevOps practices. This, in turn, delivers shorter release cycles and more reliable deployments. The bottom line: helps you to ship more frequently. This drives the business forward

Over its lifecycle, the software will require updating and repackaging in order to fix defects and provide additional features. Once updated, the components require redeployment over a distributed cluster of servers or re-packaging for deployment onto third party servers.

DevOps practices are much more likely to be adopted if the tools make it easy to do the right thing. Here’s my top 6 Day 1 and Day 2 tasks BOSH automates and simplifies:

IaaS provisioning 

Provision & configure infrastructure (virtual machines, network, storage) on-premises or in a public cloud. By design,  BOSH abstracts away the differences between infrastructure providers into a generalized cross-platform description of your deployment. Differences between infrastructures are, handled by the CPI (Cloud Provider Interface) layer. Operators simply declare the desired infrastructure in a manifest “YAML” file and BOSH does the rest, delivering the same configuration across the different infrastructures where the BOSH release will be deployed.

Operating System Imaging & Config

Typical configuration management solutions build OS images by installing software, possibly using external references. To avoid the potential for drift, BOSH curates a library of hardened, instrumented operating system images ("Stemcells") in a library, for use at service deployment time. This approach guarantees every environment deployment is identical and isolated from changes to external dependencies. You can deploy an old release over and over and BOSH will produce the same results every time.  BOSH delivers consistency that ensures updates run as expected when reaching production.

Platform, Services and Application Deployment

BOSH has a concept of a software release that packages up all related source code, binary assets, configuration, and so on.  This allows users to easily track the contents of a particular release, including source code, tools, environment, and other components. Once created, a release enables BOSH to build and deploy the various processes that make up the service, across a cluster of host virtual machines.

Monitoring, Recovery, Restart

Another key feature of BOSH is resiliency. Traditional infrastructure-as-code provisioning tools don't check whether services are up and running on Day 2 and therefore can't take restorative action when needed. Because BOSH has strong opinions on how to create your release, you're required to create a monitor script for every process. Now, if a process dies then the monitor script (monit for Linux) will restart the process. In addition, the BOSH Resurrector has the ability to recreate failed or unresponsive VMs. On re-creation, BOSH can also remount persistent data. 

OS, Platform, Services Update

Many systems leave it up to the operator to figure out the right order to apply updates, how to identify any regressions and the actions required to rollback an update. The BOSH deployment manifest is completely declarative. That means – the system figures out the best way to apply rolling updates to the environment.  Canary deployments mitigate and quickly rollback any regressions automatically. This applies to operating system updates, application updates and even platform and BOSH updates themselves!

 Dynamic, Proactive Security

Following the 3 R’s philosophy of cloud-native security, BOSH supports the ability to rotate credentials, repave host VMs on a schedule and repair OS vulnerabilities through rolling OS updates. Audit and compliance are easily built into the platform in a standardized way rather than rely on ops staff to implement. 
 

The Future: Automate All the Things?

Where next for operations teams? If you follow Google’s example, once remedied every pager alert should be analyzed to decide whether the alert could be handled automatically through code. The unique combination of coder and ops person that makes a Google site reliability engineer (SRE) means they’re in a unique position to:

  • Understand the issue & its resolution
  • Determine the root cause of the issue
  • Write code to automate detection and resolution without an alert 

Each time they follow this cycle they actually extend their automation platform. This level of investment in custom automation is beyond the means of many enterprise operations teams today, but building on a mature and rapidly evolving cloud-native ops platform such as Pivotal Cloud Foundry and BOSH means you can take advantage of the innovation and lessons learned running cloud-native at scale in Fortune 1000 companies. 

Learn more about BOSH and Cloud Foundry :