In the first part of this series, we laid out a set of principles to help you understand when microservices can be a useful architectural choice. In this post, we explore one of those factors, independent scalability, in more detail.

Let’s take a quick spin in our hot tub time machine and head back to an era before cloud, microservices and serverless computing. Servers were homegrown and bespoke, beloved pets if you will. When developers requested a server for a new project, it would take weeks or months for it to become available.

Years ago, one of my peers requested a development database for our architecture team. It was never going to hold a lot of data, the database would mostly just act as a repository for some of our artifacts. One would think a small, non-production database could be provisioned somewhat quickly. Amazingly, it took an entire year for us to get that simple request fulfilled! Who knows why this was the case. Strange things happen in traditional enterprise IT. But I do know that these kinds of delays incentivize a raft of undesirable behavior.

These extended lead times forced us to make capacity decisions far too early. In fact, the very start of a new initiative is when we know the least about what our systems would actually look like under stress. So we’d make an educated guess. This guess would be based on the worst case scenario. Then, we’d double it, and add some buffer. (Plus a little more just in case.)

And while this meant we often heavily overprovisioned infrastructure (and overspent), that was still a good outcome for organizations. After all, adding capacity after the fact was just as painful.

Monoliths and traditional infrastructure: Two sides of the same coin

It wasn’t just our initial server setup that suffered in this era though. Static infrastructure, combined with our old friend the monolithic application, made it nearly impossible for us to deal with unexpected demand. Even if we had a good understanding of what our systems needed under normal circumstances, we couldn’t predict when a new marketing campaign or a shout out from an influencer would send thousands (or millions) of hungry customers to our (now) overwhelmed servers.

Things were no better for Ops teams running our data centers. It could take months to add additional servers or racks. Traditional budgeting processes made it very difficult to add capacity in any kind of “smooth” manner. Instead operators were forced to move in a stepwise fashion relying on (at best) educated guesses about the shape of future business demand.

While this approach was logical at the time, it meant many organizations had massive amounts of unused compute resources sitting idle nearly all of the time. Single digit utilization numbers for servers were common. Business units were paying for unused capacity every month just in case a surge or spike in demand occurred. While this may have been an acceptable tradeoff in the past, today, it would charitably be considered an antipattern.

Elastic infrastructure combined with more modular architectures mitigates many of these antipatterns. Today, you can add, and just as importantly reduce, capacity on demand. You can start with a reasonable number of application instances adapting as you learn more about your load characteristics. Working with your business team, you can spin up extra capacity for that big event and then ramp it down after. You can wait until the last responsible moment freeing you to make better decisions based on real data, not hunches and guesses.

Most recognize this as a benefit of the public cloud, but it’s true in the enterprise data center as well. And the savings are just as real. Resources not used by your apps can be used by another group in your organization. (That’s why Paul Maritz, former Pivotal chairman, says, “Cloud is about how computing is done, not where.’)

Microservices <3 elastic, API-driven infrastructure

The monolith also suffered from a structural constraint: we had to scale the entire thing, not just the bits that actually needed the additional capacity. In reality, the load or throughput characteristics of most systems are not uniform. They have different scaling requirements.

Microservices provide us a solution: separate these components out into independent microservices! This way, the services can scale at different rates.

Recall our Widget.io Monolith example, helpfully shown below:

In this scenario, it is quite likely that our Order Processing system requires more capacity than our Account Administration functionality. In the past, we’ve had to scale the entire monolith to support our most volatile component. The monolith approach results in higher infrastructure costs, because we are forced to “over provision” for the worst case scenario of just a portion of our app.

If we refactor the Order Processing functionality to a microservice, we can scale up and down as needed. The result is something like this diagram:

But how do we know which parts of our application require more capacity? Make sure your monitoring is on point!

Monitoring: The secret to a thriving microservices architecture

Good monitoring is vital for a healthy microservice biome. But knowing what to monitor isn’t always obvious. Luckily for us, many organizations are sharing their experiences! You owe it to your services to spend some time perusing Site Reliability Engineering which is filled with wisdom. For example, Rob Ewaschuk identifies the Four Golden Signals: Latency, Traffic, Errors and Saturation.

The Golden Signals can indeed provide insights into parts of your system that could benefit from independent scalability. Over time, you will gather priceless intelligence about the actual usage patterns of your application discovering what is normal (green), what values send a warning about future issues (yellow) and what thresholds are critical requiring immediate intervention (red). Look for areas with significant traffic or where latency exceeds your requirements. Keeping a close eye on saturation and error rates will also help you find the bottlenecks in your system.

Metrics shouldn’t be a hand-rolled solution, so take advantage of tools like VMware Aria Operations for Applications, Dynatrace, New Relic, and others. Spring Boot Actuator supports a number of monitoring systems out of the box. Actuator includes a number of built in endpoints that can be individually enabled or disabled and of course you can always add your own.

Don’t expect to get your monitoring “right” the first time, you should actively review the metrics you collect to ensure you are getting the best intelligence about your application. Many companies today have teams of site reliability engineers (SREs). SREs help your product teams determine what they should monitor and, as importantly, the sampling frequency. Oversampling doesn’t always result in better information. In fact it may generate too much noise to give you accurate trends. Of course, undersampling is also a possibility. Don’t be afraid to tweak your settings until you find the goldilocks level! For practical tips, see our ebook on how to establish an SRE-based incident lifecycle program.

Choosing what to monitor can also be tricky. The key service-level objectives of your system is a good place to start. Resist the temptation to monitor something simply because it is easy to measure. We are all familiar with, shall we say, less than useful metrics like lines of code. Good metrics give you actionable information about your system.

Last but not least, monitoring is not just for production. When you monitor staging, as well as lower regions, it validates your monitors. Just as you test your code, you should also ensure your monitors are doing what you expect them to do.

All services are equal (but some services are more equal than others)

While you should never overlook the importance of monitoring, don’t neglect a less technical part of the equation: a conversation with your business partners. It is vital that you understand the growth rate of your services, and how that’s linked to the underlying business. What are the primary business drivers of your services? Do your services need to scale by the number of users or number of orders? What will drive spikes in demand? Take the time to translate this information to the specific services that will be impacted.

Don’t neglect your data in these conversations. At what rate does your data grow? What type of database solution makes the most sense for your problem domain? Are you read heavy? Write heavy? Inquiring minds want to know.

Many businesses plan promotions around holidays or marketing events. Others have predictable spikes in business around certain parts of the year. If you are a retailer, odds are you are familiar with Black Friday and the associated sales, as well as the deluge of customers. Some industries have a year end or quarterly cadence. Take the time to understand the important dates in your company’s calendar and plan accordingly. If you don’t know, ask. Again, you will have to determine what services are most affected by these business events. Internalizing the key dates for your organization can mean the difference between delighted customers and well, former patrons, lost revenue, and poor word of mouth.

It should be obvious that some services are in fact more business critical than others. Using the information you’ve mined from your business team, you can establish a criticality for a given service. Just as you categorize an outage by severity and scope, you can identify which services are business critical. Returning to our mythical Widget.io Monolith, the ability to process orders is far more important than the product recommendation service. This fact further reinforces our decision to refactor the Order Processing Service. Apply some due diligence to your services. Which could be down for a few hours (or days) with minimal impact? Which ones would the CEO know about instantly in the event of an outage?

Modernize your architecture to use modern infrastructure

Independent scalability is one of the most powerful benefits of a microservice approach. The monolith forces us to make decisions at a point in the process when we know the least about what our projects would actually need. This leads to overspending on capacity. And our apps are still vulnerable to spikes in demand. Monolithic architectures forced us to scale our applications at a coarse-grained level, further exacerbating the issue. Thankfully, things are much better today!

Modern infrastructure allows you to react in real-time – adjust capacity to match demand at any given time. Instead of paying every month for the extra capacity you only need once a quarter, you can pay as you go freeing up valuable resources.

Odds are, your applications would benefit from these capabilities. Proper monitoring along with a solid understanding of our business drivers can help you identify modules you should refactor to microservices. Cloud infrastructure allows you to be proactive while also maximizing the utilization of your hardware resources.

Read the rest of this series:

Part 1: Should that be a Microservice? Keep These 6 Factors in Mind
Part 2: Multiple Rates of Change 
Part 3: Independent Life Cycles
Part 5: Failure Isolation
Part 6: Simplify External Dependencies
Part 7: The Freedom to Choose the Right Tech for the Job

Want to learn more about microservices? Join us at the next SpringOne!

Want more architectural guidance for your modern apps? Be sure to download Nathaniel's eBook Thinking Architecturally.