devops Enterprise Strategy kubernetes management Thought Leadership

State of DevOps Report 2020 Focuses on the “How” of DevOps

For years, reports analyzing the evolution of the DevOps movement have been eagerly awaited by members of this growing global community. This research has helped establish benchmarks for DevOps adoption, identify best practices, and analyze DevOps outcomes—in other words, the results achieved by DevOps leaders vs. DevOps laggards.

It’s interesting to see how the conversation around DevOps research has changed over the years. Early on, reports focused on defining DevOps—use of automation, frequent updates, etc. More recently, they have focused on the “why” of DevOps, with research indicating that DevOps adoption results in superior financial outcomes in terms of revenue growth and profitability. 

The recent State of DevOps 2020 report, published by Puppet, shifts gears, and moves on to practical issues affecting DevOps implementation. One might say the theme of this latest report is the “how” of DevOps. 

The theory of constraints

One of the major concepts of the DevOps field is the theory of constraints. This theory, pioneered by Eliyahu Goldratt, posits that performance in any system is impeded by bottlenecks, and improvement in a system requires addressing the primary bottleneck of the moment. Solving this constraint removes the impediment to better performance so that system output will then improve. A corollary to this is how addressing a non-blocking constraint without working on the primary bottleneck won’t help improve a substandard process.

The 2020 report discusses two common constraints in today’s cloud native application world, one technology-focused and one process-focused. It then prescribes ways to improve the constraints—the “how”—with an overall goal of enabling the “what” of DevOps to achieve the “why” of superior financial performance.

Constraint buster #1: Create a platform team

The technology constraint the 2020 report discusses reflects the industry’s shift toward Kubernetes as the platform of choice for cloud native applications. As application developers have moved to container packaging, Kubernetes has emerged as the de facto standard for container deployment and orchestration. It’s hard to overstate just how important Kubernetes is in the cloud native ecosystem; it is the platform of choice for modern applications.

There’s just one problem: Application groups have found that operating Kubernetes is not for the faint of heart. 

One challenge is that when people think of Kubernetes, they imagine it as a complete system that includes everything one needs to deploy and operate container-based applications. But Kubernetes by itself only delivers part of a complete solution. In addition to the container deployment and management that Kubernetes provides, a complete application environment requires additional capabilities in a number of areas, including application code packaging and container creation; event logging, and aggregation of those logs in the Kubernetes cluster and associated applications; monitoring of the cluster resources; workload balancing across redundant clusters; and more. 

Kubernetes provides none of these things; users must instead turn to other open source or commercial components and incorporate them into the Kubernetes cluster. In effect, users must become system integrators. This is a thankless task, involving an ongoing commitment to tracking updates, installing patches, testing functionality, and rolling out new versions. And it has led to the growing popularity of commercial Kubernetes distributions like VMware’s Tanzu. Indeed, by relying on a third party to perform Kubernetes component integration, application groups can offload a lot of grunt work to a specialist organization.

Even if an application group eschews the system integrator role, it confronts a secondary issue: operating the Kubernetes system. Application groups that roll their own cluster share a common experience where getting started seems straightforward, but quickly becomes an ever-increasing set of tasks. In effect, application groups find Kubernetes management is a tax on productive work and chafe at paying for it from a budget targeted toward delivering business functionality. I saw this phenomenon up close at a previous job; development groups struggled to keep their applications operating at necessary levels and came to view Kubernetes with a jaundiced eye.

The 2020 report has a prescription for this challenge: Set up a platform management team operated by the infrastructure organization. This allows the company to centralize expertise and amortize Kubernetes management costs across a larger number of applications. This results in a “Kubernetes tax cut” for each development group while increasing application uptime through better platform management practices.

This recommendation is absolutely spot on. The most successful Kubernetes shops I work with have stood up a platform group and allowed development teams to focus on what they do best, which is to deliver value through business applications.

A centralized team is not a panacea, though. The group needs to think of its platform as a SaaS offering and implement the practices of the best SaaS developer-oriented offerings: easy onboarding, low friction to first use, ongoing product management to ensure product/market fit, and significant outreach via evangelism to drive up adoption. Marrying centralized operations and user focus is a recipe for removing a primary DevOps technology constraint.

Constraint buster #2: Change your change management process

As I’ve noted, constraint theory posits that improving a bottleneck that is not the primary hindrance to a process doesn’t have much effect on system performance. We all understand this. If getting a new household appliance installed requires waiting three months for it to arrive from the manufacturer and then waiting a day to schedule an installer and have them come and hook the thing up—well, then, reducing installation from one day to one hour isn’t going to shorten overall availability time by much. If you want to improve this process, the right place to examine for improvement is the order-to-delivery process, because 99 percent of overall availability time resides there.

The 2020 report identifies a significant process bottleneck in most IT organizations: change management. This phrase refers to how updates to applications are approved for deployment into production, and typically encompasses the work of several groups and some final person or group that provides approval for deployment.

The change management process is useful in theory. It is designed to address a common problem organizations encountered in pre-change management: code changes being released into production without ensuring important requirements are being met in the areas of security, enterprise architecture alignment, functionality and scaling testing, as well as any organization-specific requirements. The result of an uncoordinated deployment process is just as one would expect: unknown code running production applications, buggy code failing to operate properly, and an inability to ensure all important constituencies are satisfied that an update is ready to go.

The solution for this problem is change control. The organization defines important stages in the deployment lifecycle, ensures every important constituency is represented and able to approve/disapprove the proposed change, and validates the process through a set of milestones and official checks. The apotheosis of this approach is ITIL, a UK government-created, structured IT management standard that includes prescriptive change management recommendations.

The only problem with ITIL and other change management approaches is that, in practice, they tend to result in ponderous processes that restrict the flow of code changes. In a cloud native world in which velocity of code rollouts is the difference between business success and failure, this model of application lifecycle management is fatal.

So damaging is a sluggish change control process that it can overwhelm all other measures taken to improve lifecycle performance. This can lead to IT organizations wondering why their expensive DevOps initiatives haven’t resulted in improved deployment time frames. In a phrase, implementing technical improvements in the face of overall organizational sluggishness is like rearranging the deck chairs on the Titanic—interesting and diverting, but doomed to failure.

According to the report, there are four styles of change control, ranging from ad hoc up to operationally mature, with the primary distinguishing factor being the degree to which the change control approval process is well-structured (which it refers to as highly orthodox and highly adaptive) and automated. The chart below comes from the report and describes the four styles of change control.

It goes without saying that cloud native companies like Netflix and Lyft reside on the left side of this chart. I’ve heard many enterprises bemoan their failure to match the deft change control practices of cloud native companies and ascribe it to their larger security and regulatory requirements. 

This overlooks the fact that a number of cloud native companies operate in highly regulated environments (e.g., Stripe) and still have operationally mature change control practices. They’ve just put in the hard work to streamline the process, recognizing that any manual milestone in the application path to production is an ongoing tax on their core competence. The report quotes a statement made by Amazon CEO Jeff Bezos that emphasizes just how important reengineering change control is: “Even well-meaning gatekeepers slow innovation.”

Frankly, the change control process and how it embodies the way an organization views the importance of software productivity is so important that it could have formed the entirety of the 2020 report. Getting this right is so critical that our Tanzu Value Advisor group, of which I am a member, routinely prescribes value-stream mapping an organization’s current application path to production. This mapping identifies all the steps between the completion of developer code and production deployment and makes recommendations for improvement. Unless an organization gets the path to production streamlined, other efforts to become cloud native are unlikely to bear much fruit.

Your action Items

The latest report reflects how DevOps has moved from a novel technical approach pioneered by a few innovators to a well-defined field ready for widespread adoption. Nevertheless, practical impediments to industry-wide implementation remain, and the 2020 report offers two suggestions to improve DevOps outcomes. 

The first—set up a platform team to centralize expertise and allow developer focus on business functionality—is a sensible recommendation. Expecting application programmers to operate a complex platform like Kubernetes while delivering industry-focused functionality is an unfair expectation and likely to result in poor outcomes on both dimensions. Far better is to set up a platform team that specializes in keeping a Kubernetes environment available and operational. This will free application teams to focus on business functionality and achieve better results. Such a team should take the following actions:

Survey your application portfolio to identify where Kubernetes is being used – This will give you an idea of the breadth and range of Kubernetes use within the organization and help you formulate a platform strategy.

Stand up a Kubernetes platform team – Stock this team with experienced Kubernetes operators, ideally with a number of people from across the organization to ensure key end user requirements are part of the platform strategy.

Identify and select a core Kubernetes control plane system to serve as a central control mechanism – The reality of modern IT organizations is that Kubernetes clusters reside across many different environments, including local data centers, one or more cloud providers—even in distributed edge locations. Critical to managing the sprawl of clusters is a centralized control plane allowing configuration, security, and policies to be managed from one location and deployed across the sprawl. VMware Tanzu Mission Control is a great tool for platform groups to leverage centralized expertise and apply it anywhere a Kubernetes cluster happens to operate.

The second—improve change control processes—is deceptive. It sounds straightforward, but as with any process modification likely to involve organizational change, will encounter inertia and resistance. That’s human nature. Nevertheless, without getting the path to production right, organizations will find their DevOps efforts perform poorly, with little discernible business benefit. Investing the time and organizational bandwidth needed to raise change control productivity is a prerequisite to performing at cloud native levels. With that in mind, organization should do the following:

Evaluate your current change control mechanism to see if it supports the kind of application release cadence you want for your cloud native applications – The answer will probably be no, but you should verify this to ensure you’re on the right track.

Create a path to a production value-stream mapping assessment to identify all groups and milestones in the application development process – This will enable you to identify technical and organizational constraints to rapid application deployment. VMware Tanzu Labs can help you define a prioritized list of change control process modifications.

Work down your prioritized path to production changes, automating approvals and integrating technical operations – Again, Tanzu Labs can help with this, with most organizations realizing reductions in deployment time frames of more than 90 percent.

The message of the State of DevOps 2020 report is clear: It’s time to move from evaluating DevOps to taking action to implement an integrated, automated path to production. Removing the most common constraints enterprises encounter can help them meet the standard of cloud native companies and achieve DevOps excellence.