This post is the result of collaboration of nearly a dozen people across our VMware Tanzu Labs teams (formerly known as Pivotal Labs), including Rohit Kelapure, Shaun Anderson, Marc Zottner, Tom Collings, Michael Barinek, Sebastian Wittenkamp, and others.
Software architecture represents the high-level shape and flow of data, user experience, messaging, and computation of a software system. However, the “shape” of software is based on decisions that, in some cases, are hard to reverse. One size does not fit all and there is often a need for multiple forms of architecture, like the four forms of event-driven architecture. The cost of making a mistake when migrating a legacy system—say, a mainframe application to distributed microservices on the cloud—is high.
Hence, there are patterns and learnings encoded from previous experience that we must bring to bear in the design of new systems. It’s important that upfront architecture design is balanced with the evolving needs of the product. You probably want to balance “just enough” lean architecture upfront rather than iterating on a purely evolutionary basis. While there is a high degree of uncertainty, exploration of the domain bound by feedback loops is critical to designing the right system.
Still, it is important to iterate toward the right product market fit—especially for greenfield projects. The problem with upfront architecture is that developers fall into the trap of over-planning up front, not realizing they can actually get going much sooner than expected. On the flip side, it’s important to have a sense of where you’re going before you dive in.
An evolutionary architecture designs for incremental change in an architecture as a first principle. If evolutionary change and agility are built into the architecture, change becomes easier and cheaper, thus allowing changes to development practices, release practices, and overall agility. The trick is finding the right balance between upfront and evolutionary, between predetermined and iterative.
Finding the architecture Goldilocks zone
After taking into consideration non-functional needs like scale, concurrency, and extensibility, the fine act of balancing feature discovery and validation between evolutionary and intentional architecture leads us to the Goldilocks zone of corporate architecture:
-
For greenfield new business capabilities, the focus is first on user-driven design. This helps developers avoid analysis paralysis and instead iterate with velocity toward the right product.
-
In enterprise modernization, the focus first needs to be on architecture to reduce churn and rework, because the problem being solved is very architecture-heavy and the scale is huge.
-
When greenfield development occurs on an existing monolith, a balance needs to be struck between architecture, design, and user experience.
All three need to happen in parallel.
The crux of the issue is timing. The question isn’t whether to do this or that, but when to do it. Another key question to answer is around what problem you’re trying to solve. Why might it be deemed a waste of time, for example, to spend a week of project time debating Kotlin vs. Java, or event sourcing vs. docket-based choreography? Because those aren’t the problems you’re solving for.
So the key is determining the answer to these two questions: Based on the problem set, when do we plan, and when do we do? We often think that greenfield applications require an evolutionary architecture approach while brownfield modernization efforts require a lot of upfront planning. However, this is a false dichotomy. There is always a level of upfront architecture and planning involved. Deciding whether to use Kotlin vs. Java, for example, involves a similar mental process as deciding whether to use sync or async. If you have a ton of experience, you may already have a substantiated opinion at the beginning of a project and don’t need to spend time thinking, deciding, and testing assumptions. You’d still be planning up front, but you’d also be able to get to the doing part more quickly.
And although evolutionary architecture does tend to pertain to a single application, when you know—based on product knowledge, experience, multiple teams, etc.—that you’re dealing with a system of systems or multiple applications (i.e., there are clear app domains and boundaries), you should naturally consider spending more time up front. This holds especially true for high-volume, high-transaction systems. It’s even more true when you’re rewriting an existing, profitable system due to unrecoverable technical debt.
Choosing an async architecture early on could delay early product iterations, but the long-term benefits might significantly outweigh those early delays—making it a highly valuable upfront investment over time.
Kent Beck’s 3X model: Explore, Expand, Extract
If you can spend extra planning to reduce risk, by all means do it. Make sure you are buying something valuable with your delay.
— Kent Beck (@KentBeck) July 19, 2019
Kent Beck’s 3X (explore, expand, extract) model provides a framing for how to progress on an idea and, depending on where you are on the journey, pick the right architectural approach The three phases frame the context in which certain values, principles, and practices apply:
-
In a greenfield development project where the product has no shape or definition, you need to run a bunch of experiments to figure out what the most valuable ideas are. This is the explore phase, where iteration and feedback loops are key. Lean startup refers to the explore phase.
-
Once the product has some definition and reaches a local maxima—that is, product market fit—it's in the expand phase of development. At this point, you should begin to engineer the product for escape velocity. It’s important to escape the capability trap and invest in the right things. And it is key in the expand phase to allocate resources toward proactive maintenance, renewal efforts, and capability-building in such a way as to tackle problems before they torpedo the product.
-
The extract phase is where systematic training and economies of scale are attained by reducing scalability costs, latency, and throughput variance. When you’re modernizing brownfield systems, you need to be cognizant of the existing scale and entropy of the current enterprise architecture. You also have to get to a place where the confidence in tests and automation is high enough that you feel safe changing things. And you have to plan releases, because things like data migrations can have operational impacts that translate into real time and money.
Some (unfinished) thoughts on #TechDebt in relation to @KentBeck's 3x model, inspired while watching his @Andela workshop today… pic.twitter.com/ijG01AhGVG
— AntonyMarcano (@AntonyMarcano) December 10, 2016
The app continuum
The Application Continuum, like other cloud-native frameworks and patterns (e.g.,12-factor apps and the 3X model), attempts to frame idiomatic best practices to provide guardrails for cloud-native app development—and, really, for all application development:
“Starting with a single application is just as viable as starting with microservices (a distributed system), it simply depends on how much information you’re given.”
On microservices vs. monoliths, for example, it is not that one always beats the other. You can build tightly coupled, poorly factored microservices and well-factored, maintainable monoliths—and vice versa. You should understand the implications of both and choose what makes the most sense for your product or use case. Even when following the app continuum, there is time spent on upfront design—typically in the form of writing Gradle or Maven scripts to properly manage dependencies across libraries or components.
Some applications can remain as well-factored monoliths, whereas others can evolve on the app continuum into a distributed system.
For brownfield applications, however, the challenge is that applications may not have evolved along the continuum and, as a result, are poorly factored. For poorly factored monoliths, the discovery of domains and bounded contexts proves challenging and is typically determined by cross-functional, socio-technical practices like event storming.
The right decision at the right time
As Martin Fowler notes, all software requires some upfront thinking and design. Software with a high-quality internal architecture gets a short initial slowdown, but it allows developers to deliver more rapidly and cheaply later on. It’s a phenomenon Fowler has dubbed the Design Stamina hypothesis. If the functionality of the initial release is below the design payoff line, then it may be worth trading off design quality for speed. But if it is above the line, the trade-off is illusory. When your delivery is above the design payoff line, neglecting design always makes you ship later.
Overall, we advocate for picking the right architectural and implementation approach at the right phase of the project—the one that provides the principles needed to guide the agile architecture and initial direction of the project so as to avoid the irreversible decisions on technology and implementation that developers may regret later.
Decomposing the monolith
True modernization is not simply rewriting an existing system using new technologies. Rather, it’s about understanding the system and how it relates to the core enterprise as a whole. For example,engineers, developers, and technical architects tend to think from a technical nuts-and-bolts perspective (the “how”). They subsequently tend to look at modern technologies such as Cloud Foundry, Spring Boot, Steeltoe, Kafka, and containerization as comprising the definition of a modern application. But this perspective omits the question of “Why?” and can introduce artificial constraints in their thinking. They may give little thought given to the overall vision, or the art of the possible.
Nor does focusing solely on architecture tend to provide a suitable solution. Traditionally, organizations designed applications with monolithic or layered architecture. While they have been successful with those designs, many problems have emerged: Deployment times are long, full regression tests are necessary for minor changes, release cycles are slow, and teams have a hard time reasoning about their own software. And while microservices are an effective way to solve these problems, it is not enough to simply build small services.
Care must be taken to ensure that services are either loosely coupled or that they retain the monolithic nature of their predecessors. If poorly executed, the problems can compound, thus requiring deployments of a large number of services for simple changes. If applied correctly—using principles of proper resource sharing, single responsibility, and domain-driven design—a microservices architecture can facilitate rapid deployments, shorter testing cycles, and the adoption of a “product team” organizational structure.
Therefore, some of the core questions in enterprise application modernization are:
-
How do you evolve your monolith to an event-driven, modular microservices architecture?
-
What are the right domain boundaries and how do you achieve them?
-
How do you bridge the gap in understanding between the non-technical, top-down way of thinking and the technical, bottom-up thought process?
-
What does it mean to build an architecture that maps to the way the system “wants to behave?”
Critical facilitation techniques like event storming and the Swift method can help answer these questions and lead to the hard implementation and design choices necessary to maintain a modernized system architecture.
High-level architectural design
The questions we must ask of our architectures are broadly the same, whether we’re talking about microservices, communicating threads or processes on a single machine, or classes in a monolithic Java app. Here are some useful heuristics to use when looking at a high-level architectural design:
-
Avoid deep dependency trees. When dependencies are shallow, you can get more confidence out of your unit tests because the units are coupled to fewer things. A “unit” might be a whole service if you’re talking about a microservices architecture. This, in turn, means you need fewer integration tests and the cost of testing the system is lower.
-
Ensure components depend on things that change more slowly than themselves. Every dependency is a chance for a failure to cascade. When the provider layer evolves at a faster rate than the caller, it allows failures to jump the gap and cascade faster in a complex system.
-
Maintain a single source of truth for each piece of data. If, for example, a microservices architecture splits a crucial piece of data between two microservices, synchronization issues can be a nightmare.
-
As a corollary to the above two points: Avoid dependency cycles among stateful components. Cycles mean (a) it’s perilously unclear who owns the state, and (b) at least one component must depend on something that changes faster than it does.
-
Ensure that, where possible, components are stateless functions. For example, an image-resizing microservice can be modeled as a function.
-
Avoid high fan-in to stateful or side-effecting components. Fan-in is a sign that different components are sharing mutable state. High fan-in to stateless components is ideal!
-
Make it easy to roll back a failed transaction. It’s much harder to roll back transactions when they span multiple microservices.
-
Consider what happens as the deployment scales to handle more traffic. If two or more microservices seem like they will need to scale out at the same rate, maybe it’s not worth keeping them separate.
-
Consider what happens when a microservice goes down. Can the app’s functionality degrade gracefully (e.g., if the recommender service is down, can the homepage show a default list of product recommendations instead of a personalized one)?
-
Design code around data, not data around code. Data is the most important part of any software system. Domain-driven design helps derive the right domain and logical data models that the whole team can align on before physicalizing it into SQL or NoSQL. Your system architecture should be congruent with the core data pathways.
Continuous validation of enterprise architecture
Architecture must constantly evolve to address the needs of the domain. However, enterprise architects are “graded” on delivering an architecture before moving on to the next thing. Instead, architects need to be embedded with developers to find out where patterns are failing and modify them accordingly. The answer to the question, “Can you validate our architecture?” should be something along the lines of, “We can teach you how to evolve your architectures and patterns to solve the problems you haven’t encountered.”
By embedding architects with developers and actively implementing user stories, we protect against ivory tower architectures.
We believe that architecture is abstract until operationalized. There’s simply no amount of upfront planning or process that can guarantee with 100% certainty everything will work out perfectly when you go live. You can only know that your system is doing the right thing by deploying and battle-testing it. Likewise, you cannot judge the long-term viability of any architecture until it is implemented and shown to evolve to address the changing needs of the domain.
You cannot define “validation” using thought experiments; it must be demonstrable through the implementation of concrete product use cases. Based on that, “validation” may be the wrong term. At best, we can “find the smells” and prioritize addressing them as immediate priorities.
You can do this with an eye toward problems that may exist in the future, but solving for problems that might arise later on tends to result in over-engineered solutions that actually box you in when unexpected problems occur.
If the architecture is treated as an immutable artifact, the project is destined to fail. That’s why a big part of this process includes monitoring and metrics, so that you can observe when hot spots occur and be able to handle them in a timely manner. Over time, this is a form of feedback about the architecture implementation with respect to which requirements are working and which are not. Performing tactical fitness tests using chaos-engineering techniques and tools like Spring Cloud Contracts also helps highlight weaknesses.
Here is a quick set of tips to help you think about validation effectively:
Where
The best place for validating an architecture is production. Running code in production acts as a force function against over-engineering, gold plating, and excessive abstraction. This is super scary for most enterprises.
What
Simplify rather than complicate. Strive to achieve just the right level of abstraction at the right time. Abstraction is useful when separating concerns and is useful in testing. Separating concerns helps to keep the cost of changing a decision low and may not require much effort. There isn’t one school of architecture. Architecture is code and code is architecture. Fitness functions need to bring out the essential complexity rather than any accidental complexity.
How
Implement a thin end-to-end slice of business capability as a set of use cases and deploy the components into production to realize the architecture. Include CI/CD, automation, and monitoring/metrics/observability. Do this with a slice (not all) of production traffic and gradually increase the load to flex the architecture. Our development practice of domain-riven design with code driven by test-driven design and paired with continuous feedback allows the architecture to reach just the right level of abstraction.
Start small. Figure out where it doesn’t work and where it does. Learn along the way and make adjustments.
Managing a sustainable architecture
So, what does life look like after the Big Bang that is the first release of the product, when the architecture is baked? And what practices do we need to follow to allow the architecture to evolve correctly? Here are some things to keep in mind:
-
Make sure the architecture is simple, holds true to the guiding principles, and is operational.
-
Look out for complex scope creep as the system matures. This can show up as integration complexity and additional or unnecessary microservices. Try to keep changes minimal while adding new features.
-
Remember that the system does not need to be 100% efficient as long as the architecture is staying true to the original mission.
-
Create a culture of continuous verification and feedback. Late-breaking changes are to be expected, and you can never capture requirements to 100%. But you can do better at validating and verifying them in a tight feedback loop.
-
Measure for the behavior you want, because people optimize what they measure. You should be measuring end-to-end transaction times and aggregates to get immediate visibility into the system.
Finally, remember that the choice between upfront and evolutionary software architecture is a false one, as is the choice between monoliths and microservices. Both are applicable at different phases of a project. Architecture is best thought of as a continuum that allows a product and system to evolve at the right pace with the right set of principles at the right phase of the project.