The story of Agile has only just begun. This post is the first of three in a series documenting the history of Agile, an approach to software found across the world—from Airbnb to Apple, startups to multinational companies—focused on iterative development. By understanding the past of this methodology, we can gain insight into software’s future.
Agile is not really about computers or software, it’s about the organization of people. And to properly tell this story, we’ll need to stop at a few places with a theoretical time machine. So let’s start in 1799, with the foundation of the Royal Institution of Great Britain (the Ri). The Ri remains a premier academy for the best scientists and natural philosophers to this day, but the real reason we start our journey here is to show just how young our industry really is.
Let’s jump to 1818, with the founding of the Institution of Civil Engineers. This professional body came together just over 200 years ago, even though bridge building has been around for much longer as a trade. Bridge building is often used as a metaphor for software delivery—the team comes together to build something resilient that will be used by the masses.
Stars, Missiles, and Moon Landers
Now we travel to 1875, where the first “human computers”—and the term computing and computers—were used for the first time. These computers were actually a group of women at a Harvard University lab doing star chart analysis. At this period in time, the “high sciences,” were done predominantly by men, and the data processing and computational work was done by the women in these types of labs. Despite the scenario, these women went on to find some amazing discoveries—including spectral analysis of stars and celestial bodies to determine the material makeup of planets and stars.
Photo courtesy of Wikimedia Commons
From the stars of 1875 we’ll move briskly into the 1940s, when the first Turing complete computer was created. This was done predominantly around the second World War for the use of ballistic missile trajectories and code-breaking. This is where we see the first real computers—as we know them—really start to take shape. Even though these machines used more analog technologies (vacuum tubes rather than silicon chips), they might still be something that we’d recognize today.
In 1965, Margaret Hamilton, a developer at NASA, coined the term “software engineering.” But similar to the days of the “human computers” at Harvard, software was seen as a kind of second-class lower science, less interesting work. Hamilton was a big proponent of trying to bring discipline from hardware engineering into the real of software engineering. She famously said, "The space mission software had to be man-rated. Not only did it have to work, it had to work the first time. And not only did the software itself have to be ultra-reliable, it needed to be able to perform error detection and recovery in real time." A significant, intense ask that makes me grateful we have some abstractions to help us with our software delivery today.
Hamilton at NASA, Photo courtesy of Creative Commons
We can see that many of the early software projects were about the critical importance of maintaining safety. Sending people into space has to be right the first time. Space missions are incredibly expensive and there’s a huge amount of government funding and accountability. The same goes for the cryptanalysis during World War II, lives were hanging in the balance. The importance of these projects really required a cautious method because they were so safety critical, which is how we get to early Waterfall development.
A Risky Implementation That Invites Failure
In 1970 Winston Royce published a paper called Managing the Development of Large Software Systems where he draws the following graph, which looks very much like a waterfall going from system requirements, analysis, coding, testing, and operations.
But before we overly burden Royce too much, immediately after this graphic he says, "I believe in this concept, but the implementation described above is risky and invites failure. Required design changes are likely to be so disruptive that the software requirements upon which the design is based and which provides the rationale for everything are violated."
In other words, Royce is saying that cost increases as time goes on, and this graph really represents that. If the cost of change is exponential in these systems, then it demands that you have to know exactly what's happening in the previous step. It has to be accurate. It has to be validated before going to the next step because as you move through each step, it gets exponentially more expensive if there is changed required, whether that is an error or whether that is a change of the requirements.
Tragic Consequences
With Waterfall, there’s a strict structure which means every decision has to be right, and every advancement requires a formal sign-off. What follows are some case studies (some more extreme than others) where this perfectionism a waterfall process broke down:
The mid-1980’s gave us wonderful music and regrettable hair choices, but there are also some tragic engineering missteps. Take the Therac-25, a computer controlled X-ray machine that accidentally gave at least six patients a massive overdose of radiation, hundreds of times greater than the intended amount.
This ultimately led to three deaths due to radiation poisoning, and reports that followed placed the blame on overconfident engineers, a lack of process to resolve reported bugs, and poorly-designed malfunction alerts to the radiologist. The machine knew that it was giving too high a dosage, and it would show a very ambiguous message to the radiologist who was then able to override the system and issue a really significant and lethal radiation dosage.
Another example can be seen in London’s computer-aided dispatch in 1992. This was a computer system, which aimed to replace a paper-based system and increase the efficiency for the dispatch of London ambulances. Great idea but the result was a disaster. It saw multiple units being sent to the same address and no units being sent to others. Calls got lost, which resulted in repeat calls, which were logged in the system individually. This congested system couldn’t handle the volume of calls, which was unexpectedly high because they were able to accurately dispatch the ambulances.
There was no rollback. There was no plan B. The process only included one plan, with no adaptability. In the media that followed, there were reports of between 30 and 45 deaths, all due to a software release process gone horribly wrong.
Takeaways From The Failures of Waterfall
Assumptions About The Accuracy Of the Previous Step
If we go back to the waterfall process you can see the major failings: the inaccurate assumptions, and the clunky sign-off process. But these are related. Often in Waterfall, the people who are signing off for a team to get to the next step are not the most familiar with the project. They’re just looking through requirements and saying if everything looks good. This top-down approach defaults to a process where those who sign-off are making assumptions, and they don’t have the day-to-day experience of working on the project to help their judgment.
Too Much Pressure Around Fixed Scope, Time, and Cost
When managers push developers and their teams to work longer and harder to meet their contractual demands, it’s ultimately going to result in producing poorer quality software. They’re going to make sure they're getting that sign-off process without necessarily making sure whether what they are building is the right implementation.
Integrated Testing Happens Too Late, If At All
In the case of Therac-25, there was no end-to-end testing. The first time that everything worked together was in the hospital itself, which is quite a scary thought.
No Validation Of User Needs and Interactions
Again, think about those radiologists and the messages which they would see, or how the ambulance dispatchers were overwhelmed from the call volumes. and how to handle those call volumes. There was no A/B test, no small trial, just a really big launch that proved many people didn’t understand what was really needed to solve the problem accurately.
Process, Not People-Centric
Ultimately, Waterfall’s biggest failing is that it puts its trust in a system, not the people working on a product. If your system does not empower the team to test ideas and change requirements based on these tests, then it is simply structure for the sake of structure. This reduces the accountability of individuals and puts more emphasis on successfully passing a gate, over building the right product.
The Path To Agile
We’ve covered a lot so far, and many brilliant people. Margaret Hamilton and Winston Royce pioneered the Waterfall method and then Thomas E. Bell and T.A. Thayer help bring the term to the forefront (the first use of “waterfall” may have been in this paper) by the late 1960s. But despite a name that suggests free flowing movement, not much happens to improve the Waterfall process after 1976. There is a small amount of progress around the discipline of project management, but the structure remains largely intact.
And while the cautious Waterfall approach resulted in some breakthroughs, there are some clear ramifications of following it strictly. In order to get us to 2001, when the Agile manifesto is created and published, we have to look at another industry—manufacturing—and how it evolved and influenced thousands of companies across the world. We’ll explore this path in part two of this blog series.