design design_principles lean_startup learning operators pcf healthwatch platform_operations user_stories ux

How We Designed PCF Healthwatch: A Lean & Collaborative Approach that Embedded Co-Creation and Continuous Feedback.

According to Lean UX champions Josh Seiden and Jeff Gothelf, Lean UX processes are meant to bring designers and non-designers together in co-creation. In this post, I aim to show you how Kevin Gates, Product Designer, embedded collaborative design into his workflow. He joined with Amber Alston, Product Manager, to design what we now know as PCF Healthwatch. Continuous feedback loops and co-creation exercises were instrumental in identifying the most valuable solution for operators experiencing pain around platform monitoring.

The Beginning

After enterprises started running production workloads on Pivotal Cloud Foundry (PCF), it became apparent that operators need a view into the platform and its current state. These engineers needed the insight to help them manage the health and the performance of their foundation. PCF emits a lot of metrics. But as an operator, you’re not after volume – you want to decipher meaningful signals from noise. Operators told us about their “day in the life,” and frankly, how painful it was.

It was clear customers expected Pivotal to solve this problem, so in December 2016 Kevin teamed up with Amber to improve the monitoring experience Operators have when using PCF. This duo had collaborated on PCF Metrics in the past, so they were eager to pair on a new challenge.

And what was that challenge? To create a solution amid many unknown variables. Which metrics are most interesting and pertinent to operators? How do they triage them? How do you design with the user in mind? How do you keep things lightweight as you explore possible solutions? Kevin sought to address these concerns through a lean UX design approach. How? By effectively leveraging co-creation with stakeholders, collaboration and tight feedback loops.

First Things First: Understand The Current State

At the start of this endeavor, Kevin and Amber sought to learn how Operators currently monitored their platform.

“We didn’t communicate to anyone about a potential out-of-the-box-solution. We just wanted to understand the current state of things,” Kevin said.

Embracing ambiguity, Kevin and Amber sought to gain clarity by interviewing customers, platform architects, and other stakeholders.

They spoke to about 25 people, asking open-ended questions to understand their experiences with platform monitoring. Clear themes began to emerge. Customers were experiencing pain when it came to:

  1. Learning how to monitor PCF
  2. Triaging data. There was a bad signal-to-noise ratio in the data
  3. Upgrading PCF. Handmade dashboards would crash each time

Expert Panels and Deep Collaboration

Shortly into this exploratory research, Kevin and Amber teamed up with a talented group of engineers to start building a potential solution. However, Kevin was concerned about all the unknowns. So he proposed the formation of an expert panel to serve as customer proxies throughout the process. This panel would consist of platform architects with two attributes: a deep expertise in platform monitoring, and a close connection with customers. “Working with customers and prospects, you hear a lot of comments and pain points,”  says Jamie O’Meara, platform architect and one of the subject matter experts on the panel.

The expert panelists played an integral role in the product design and development. They participated in design reviews, and with helping Amber and Kevin validate – and invalidate – assumptions about the domain and user behavior. The end result: better guidance for the eventual the product.

Tight Feedback Loops & Co-Creation

Kevin is a product designer who embraces lean processes. So he presented lightweight ideas at their lowest fidelity to users and to stakeholders for feedback. This practice creates a culture of rapid learning and tight feedback loops. These prevent planning fallacy. Planning fallacy increases the risk of designing the wrong thing by planning the outcome of the project at the beginning of the cycle, without validating assumptions or exploring the unknown. In contrast, the lean approach narrowed the information gap and is more likely to lead to more suitable solutions.

Kevin is also keen on co-creation. For example, he conducted a weekly design studio with engineers on the team to enable them to participate in the design process, and scrutinize the technical feasibility of proposed solutions. The goal of the co-designing with engineers is to suss out complexity as early as possible, discover the best technical solution, and to empower engineers to drive the creation of a solution. Kevin presented the problem before the engineers and asked, “How do you suggest we solve this?” Lively whiteboard sessions together allowed for a shared understanding to be built and established across a cross-functional team.  

Low fidelity, handwritten sketches created during design studio with engineers.

In another exercise, Kevin co-created a dashboard concept with Natalie Bennett, anchor of Pivotal’s CloudOps group. During this session, Kevin learned that users often want to:

  • See MySQL problems in the context of a system overview

  • Know exactly what’s unhealthy on the system: ie. a specific node

  • Know when nodes were last synced, and when they were down last

Natalie Bennett walks Kevin through important indicators for platform operators use when troubleshooting.

On Friday mornings, Kevin brought designs co-created with the engineers to the expert panel for feedback.  In one session, Kevin presented early wireframes of router metrics and received very, uh, blunt feedback. One of the panelists, David Laing, Engineering Director for CloudOps in EMEA, responded emphatically, “I don’t like anything about it.”

Early wireframes of sketches presented to the expert panel. The feedback from these wireframes resulted in greater clarity about the true needs of the platform operator.

Of course, any designer will tell you that intense negative feedback can be very productive. Open and candid feedback helped Kevin and Amber learn many useful things about their target persona, the platform operator:

  1. The Operators care about their customers (developers) and about their customer’s customer (the end users of said applications built by the developers).

  2. Operators do not care about the super technical stuff until they absolutely need to. This information, say a routers CPU usage,  is very important, but it’s not as a primary indicator.

  3. Operators DO care about how much traffic is coming through their systems. Are there errors or spikes in traffic? Can a dev push an app, given the levels of traffic we’re seeing?

  4. Operators want a reliable way to know something is wrong. Then, they want the details. They don’t want the details first.

All of these findings helped the team develop deeper empathy, and discover how operators are actually thinking and feeling.

The Proposal

Customers told Kevin and Amber that platform upgrades were difficult. To work through this process, Operators looked at multiple dashboards to ascertain and observe the health of the platform. This current workflow, to once again put it bluntly, was arduous.

Complicating matters further, each time Operators updated their platform, their endpoints for dashboard metrics would break, so observability was impossible. This was a major problem. Operators need the dashboards the most when they perform an upgrade!

Kevin and Amber proposed an out-of-the-box solution, the module that would become PCF Healthwatch.  This solution would feature dev control tests (push, start, stop, etc.) to measure whether endpoints are functioning properly. This smoke test concept was modeled after open sourced code created by Google and Cloud Ops EU. When you run this test, a small app is automatically pushed to the platform every 2 – 5 minutes. The app is constantly pinged to determine if the app is:

  • In existence

  • Emitting logs

  • Startable/stoppable

Kevin co-created low-fidelity sketches of the dashboard with product management and engineering. From there, he presented them to the expert panel for feedback. The panelist liked the idea. He notes “our expert panel began to see the value in providing this metric to operators out of the box.”

In June 2017, the Healthwatch team created a functioning dashboard helping Operators monitor and understand the health, performance, and capacity of their platform. Kevin leveraged the principles of Lean UX, embedding co-creating with PM and engineering and continuous feedback loops throughout the workflow to identify user needs and guide the product's direction. Healthwatch is continuing to be iterated on based on continued learning in Pivotal’s Denver office.

You can find out more about Healthwatch features here.  

Healthwatch’s original dashboard released in June 2017. This dashboard is continuing to be iterated on as we continue to learn from our users.