caching microservices performance pivotal_cloud_foundry scalability

Supercharging Your Scale Cube: Caching for Microservices on Pivotal Cloud Foundry

For modern, microservices-based, web-scale applications, the ability to respond to a flood of concurrent requests is table stakes. These microservices respond to requests from a large user base with touch points across multiple devices, and a big increase in the type and frequency of requests from ‘non-human’ sources, like devices, sensors, and other microservices. This intensity of requests places extreme demands on the data layer’s ability to keep up. Now, let’s put this need for speed against the backdrop of a complex distributed data architecture, and it can seem difficult to picture how these applications can stand up to such challenging data access needs. Thankfully, it’s actually possible to boost performance and, at the same time, reduce complexity. Read on …

Pivotal Cloud Cache is Here

We are pleased to announce the general availability of Pivotal Cloud Cache (Cloud Cache), an integrated, in-memory, caching service for Pivotal Cloud Foundry.  Cloud Cache will be available this week through the Pivotal Cloud Foundry Marketplace.

Cloud Cache offers an in-memory key-value store that lets applications and users access data at high speed. Cloud Cache was designed for delivering low-latency responses to a large number of concurrent data access requests.

Cloud Cache has been designed for easy provisioning of dedicated, on-demand clusters. Developers or administrators can get started quickly, and adjust ongoing cache capacity based on the changing requirements of their applications and users

Cloud Cache is delivered as service plans by caching pattern, i.e. as separate, purpose built service plans for the look-aside and inline caching patterns. Each service plan (caching pattern) can be used independently as needed. With these service plans, Cloud Cache speeds up microservices architectures and supports modern devops practices. The service plans for the look-aside caching patterns are available this week, while service plans for the inline pattern, as well as options for session state caching and multi-site replication through a WAN gateway will be available later in 2017.

So, what is Cloud Cache’s role in delivering performance to microservice architectures? In the next section, we look at this question in the context of other known approaches to performance and scaling, and describe how caching can be used in combination.

Scaling Microservices with Pivotal Cloud Cache

The scale cube is an informative approach to understanding the dimensions of how microservices architectures can be scaled. This framework was first introduced by  AKF partners, and it describes approaches to scaling microservices along three dimensions.

This image, via The New Stack is how a scale cube for microservices looks, for Pivotal Cloud Foundry, the X-Axis would cover "Scale by Adding Instances"

The scale cube has received some attention, as other subject matter experts have used this framework for discussing the scalability of microservices architectures.

Caching is a powerful technique for scaling microservices, and it can be used in combination with the three techniques described in the scale cube.  In fact, the rest of this post discusses how caching can be applied in combination with each of these techniques.

Y-axis Scaling

Y-axis scaling is about functional decomposition. In fact, this is a key characteristic of microservices. A microservices architecture allows us to scale the architecture at a more granular level – by scaling each microservice independently, and applying more infrastructure resources only for the microservices that are performance bottlenecks in the architecture.

Y-axis scaling will result in a set of microservices that have different types of workloads. Of these, only a subset will be I/O intensive microservices, and these are the ones that will benefit most from caching. Cloud Cache can be applied efficiently, only where it is needed.

X-axis Scaling

X-axis scaling consists of running multiple instances of a microservice to scale the business logic. A common approach is to have a load balancer route the traffic to the next available microservice instance. If there are N instances, then each instance handles 1/N of the load.

Note that x-axis scaling is about scaling the business logic by adding instances. It does not prescribe any special treatment of the data layer (that will happen in z-axis scaling). X-axis scaling calls for a shared data layer across all microservice instances. Since each instance accesses all the data, the data layer is subject to demanding performance and scalability requirements.

Cloud Cache is a good choice for scaling the data layer, along with adding instances for scaling the application logic. Trying to scale the backing database, particularly if it is a legacy database, can be more costly, time consuming, and risky. Instead, Cloud Cache’s in-memory architecture, and horizontal scalability provides plenty of capacity for handling the performance needs of multiple instances of microservices. 

A shared data layer across microservice instances avoids the complexity associated with a separate caching layer for each instance. Requests from all the instances can be sent to a single Cloud Cache cluster, which internally routes the request to the appropriate servers in the cluster.

So, how can we reconcile this approach of a shared cache with a key principle of microservices architectures – maintaining isolation between microservices. This isolation is needed for development teams to work autonomously, without any data store or schema dependencies across teams, and therefore deliver at a much faster cadence. In this case, however, the cache is being shared by multiple instances of the same microservice, rather than sharing a cache across different microservices. Each instance has identical requirements with respect to data access, so there is no need for making separate autonomous decisions with respect to the data layer. Also, if the cache doesn’t straddle different microservices, it doesn’t straddle teams, because the best practice is to assign ownership of each microservice wholly within the team that is responsible for it. So, sharing a cache across microservice instances does not violate team autonomy in any way.

Z-axis Scaling

When using Z-axis scaling each server runs an identical copy of the code. In that respect it is similar to X-axis scaling. The big difference is that each server is responsible for only a subset of the data. Some component of the system is responsible for routing each request to the appropriate server. One commonly used routing criteria is an attribute of the request such as the primary key of the entity being accessed, like customer I.D..

Z-axis splits are commonly used to scale databases. Data is partitioned (a.k.a sharded) across a set of servers based on an attribute within the record. A router sends each record to the appropriate partition, where it is indexed and stored. Each server only deals with a subset of data.

Cloud Cache is particularly useful here because it internally incorporates the principles of z-axis scaling. There is no need for a data-aware external router because Cloud Cache inherently partitions data and is internally aware of which data resides where. From an application or user perspective, Cloud Cache appears to be a single data store, i.e. the partitioning of data is not externally visible. Cloud Cache also internally handles high-availability, by replicating the partitions of data on secondary servers. This built in ability to partition data makes Cloud Cache a natural fit for z-axis scaling. 

The Combination Screams Performance

These dimensions of scalability can be used in combination. For example, creating additional instances of a order entry system to scale the business logic, and also splitting up the data by customer I.D. such that each instance only deals with a subset of data is an example of combining x-z scaling. Other combinations also offer interesting possibilities.

Now, adding caching layer to each of these dimensions (or combinations) can boost each dimension independently or provide a compound effect when used with combinations of these dimensions.

The versatility of caching makes it a good fit for boosting performance in several ways. Caching can be used separately, or in combination with the other techniques from the scale cube. The expense and rigidity of backing stores makes caching an attractive approach to meeting performance requirements. Cloud Cache is a fast path to adopting a cache with your microservices architectures.

Learn more