This is the second installment of a blog post on Cloud-Native Operations – Read part 1
Developers are familiar with the concept of platform abstraction, offering them power while hiding complexity, making them more productive.
But on the operations side of the house, the idea and practice of abstraction has been slow to gain traction – until now. Automation in a cloud-native platform provides this abstraction, making ops more productive and more valuable to the organization. As operations teams move up the stack, they experience a force-multiplier effect as automation and autonomous systems enable them to expand the scope of workloads they’re able to manage. At Pivotal, our hosted Cloud Foundry offering supports tens of thousands of containerized applications and thousands of developers with a ops team of 6 staff.
Operations Teams as Product Teams
Cloud-native operations transforms ops teams into product owners, serving their developer customers. Automation across API-driven infrastructure means developers now have agile, self-service access to deploy and operate their apps, without the headache of understanding the underlying resources and writing extra code to protect from failures.
We often call the operation team's product "Developer Ready Infrastructure"; meeting developers at the highest level of abstraction and enabling them to focus on writing & deploying code that drives business goals, without regard for infrastructure concerns.
Also, by moving up the value line and eliminating many manual infrastructure tasks, operations teams efforts now accrue more directly to business success. In the eyes of developers and the business, they're transformed from a dragging cost center to strategic enablers of software velocity and market agility.
Just happened, @b6n drops the "Value Line" platform concept on @BoschGlobal –business functionality is the value pic.twitter.com/UJfCUnZ5ac
— James Watters (@wattersjames) March 9, 2016
Operations abstraction brings other benefits. Dropping dependency on a specific IaaS also enables increased flexibility and choice of cloud, so-called "multi-cloud" operation. Workloads can be migrated from on-premises to cloud (and between clouds) without re-skilling operations teams. Security and compliance can be built-into the platform with hardened OS images, audit trails and the 3 R’s automatically forming part of the deployment. Advanced virtualized infrastructure such as VMware’s NSX helps to automate and enforce many compliance and security assurance tasks.
The relationship with DevOps and between Developers, Operations and their cloud-native platform are actually more complex than I've laid out here. I'll cover this in detail in a dedicated post in this series: "Cloud-Native Operations and the Developer Dance".
BOSH (is) All the Things!
At Pivotal, our Cloud Foundry solution relies on BOSH for infrastructure automation.
BOSH is a proven-at-scale, open source release engineering tool that recently celebrated it’s 5th birthday, summarized in an excellent post by BOSH expert Dr Nic Williams.
It’s no fun to write complex open source that no one can deploy on Day 1 — nor continue to live with on Day 2. BOSH has been helping solve this for five years!
– Dr. Nic Williams, Stark & Wayne
But I already have configuration management…
Tools such as Chef, Puppet and Salt Stack are valuable DevOps tools for configuring new environments. However, the full lifecycle of versioning, packaging, and deploying software in a reproducible fashion often results in a custom integration of these and a variety of other tools. For example, provisioning tools such as Terraform, or AWS’s Cloud Formation that don't abstract away infrastructure specific configuration. Not only must operations maintain an expanding number of integrations but the lack of abstraction causes infrastructure concerns to bubble up for the end user or unchecked configuration drift. The upside-down economics of building your own platform in this way are explained in a recent whitepaper.
Enter BOSH – an integrated toolchain covering the end-to-end set of requirements for full lifecycle software release engineering. BOSH provides a mechanism to version, package and deploy distributed software services, enabling them to be reproduced and scale regardless of IaaS provider.
But the value of BOSH extends beyond release engineering, providing a seamless experience delivering software and then ensuring it remains highly available and resilient. Simply put, BOSH translates declarative code into infrastructure and then proactively maintains the initial state for the running service. The post-deployment or “Day 2” capabilities of BOSH ensure that deployments continue to operate in a consistent fashion, automatically compensating for any adverse events or configuration drift.
Automating DevOps with an integrated platform
Release Engineering handles the compilation, assembly, delivery and update of source code into finished software running as services. It turns out that when you automate release engineering end-to-end, you also create an ideal platform to support DevOps practices. This, in turn, delivers shorter release cycles and more reliable deployments. The bottom line: helps you to ship more frequently. This drives the business forward
Over its lifecycle, the software will require updating and repackaging in order to fix defects and provide additional features. Once updated, the components require redeployment over a distributed cluster of servers or re-packaging for deployment onto third party servers.
DevOps practices are much more likely to be adopted if the tools make it easy to do the right thing. Here’s my top 6 Day 1 and Day 2 tasks BOSH automates and simplifies:
IaaS provisioning
Operating System Imaging & Config
Platform, Services and Application Deployment
Monitoring, Recovery, Restart
OS, Platform, Services Update
Dynamic, Proactive Security
The Future: Automate All the Things?
Where next for operations teams? If you follow Google’s example, once remedied every pager alert should be analyzed to decide whether the alert could be handled automatically through code. The unique combination of coder and ops person that makes a Google site reliability engineer (SRE) means they’re in a unique position to:
- Understand the issue & its resolution
- Determine the root cause of the issue
- Write code to automate detection and resolution without an alert
Each time they follow this cycle they actually extend their automation platform. This level of investment in custom automation is beyond the means of many enterprise operations teams today, but building on a mature and rapidly evolving cloud-native ops platform such as Pivotal Cloud Foundry and BOSH means you can take advantage of the innovation and lessons learned running cloud-native at scale in Fortune 1000 companies.
Learn more about BOSH and Cloud Foundry :