cache caching data PCF pivotal_cloud_foundry services

Announcing Pivotal Cloud Cache V1.3

We’re elated to announce the immediate availability of the Pivotal Cloud Cache V1.3 tile for Pivotal Cloud Foundry. The new tile is available now for download. The highlights of this release include:

  • WAN replication across geographically distributed data centers (i.e. PCF Foundations)

  • Server-side functions that crunch data on servers, improving performance by as much as 76X

  • Data persistence, which ensures that data in-memory is also stored in the persistent disk of the VM

  • In-line Caching, where read-thru and write-behind are handled by custom CacheLoader and AsyncEventListener code resident in the cache servers rather than in the application code space

WAN Replication Across PCF Foundations

Enterprises have a mix of environments and systems across various infrastructure and data center types. As hybrid IT becomes the new normal, production environments blend on-premises data centers, off-site co-located facilities, public and private clouds, siloed legacy systems, and mainframes. 

With this backdrop, various deployment methodologies are used, depending upon the availability and scalability required. Adopting a cloud-native platform and microservices raise the expectations that business units have for an IT organization. Horizontal scaling patterns and distributed systems gave rise to microservices; they are the architecture of choice for high-velocity development teams. As development velocity increases, cloud-native software development principles, horizontal scaling, and elastic computing put a new twist on the old theme of “availability”. It’s inevitable, then, that multi-site availability becomes a standard requirement, no longer a luxury.

This white paper extensively covers how PCF customers achieve different levels multi-site availability.

The high availability requirements of microservices are covered by distributing instances of microservices across PCF Foundations (sites). So, if an entire site goes down, all the microservices that were impacted by the failure will have instances in other sites that continue processing. The operator can spin up new instances of these microservices in the surviving sites to bring the capacity up to normal.

What about the data? How does the data layer offer resilience in this scenario? This is where WAN replication is useful. WAN replication maintains copies of data at different sites, so that site failures do not impact the availability of data. The technical architecture requirements for WAN replications are driven by:

  • Resilience in the face of site outages.

  • Cloud bursting for horizontal scaling, where the primary site is on-prem and a secondary site, used for addressing spikes in capacity, is in the cloud.

  • Active-active patterns to reduce latency caused by long distances between services and the users of these services, which includes end-users or other services.

The active-active deployment pattern can be used to cover all three of these technical architecture requirements. In this deployment pattern, each foundation remains active and does useful work, but also shares some responsibility to be a backup in case there is a failure in another foundation. This pattern offers the greatest availability and superior protection against unexpected failures. This pattern also offers flexibility and control to optimize traffic in any number of ways, especially reduced network latency due to locality of the service. These benefits make active-active the best deployment pattern for advanced microservices architectures.

In the active-active pattern, each foundation is supported by a PCC cluster, and PCC clusters across foundations are kept eventually consistent via WAN replication.

As illustrated in the following diagram, requests coming into the application are routed to the appropriate site (PCF Foundation) by a global load balancer. The global load balancer partitions the workload in an active-active configuration. In the case of a site outage, the global load balancer redirects the workload to the surviving site.

By necessity, replication of data occurs asynchronously over a WAN because of network latencies. Our implementation assures eventual consistency, even when the network fails. Data is buffered in persistent queues until the connection can be re-established. There is a builtin mechanism based on version vectors and timestamps that detects the arrival of data out-of-order and resolves those situations automatically.

In-line Caching

The initial release of PCC was built for the look-aside caching pattern. In PCC V1.3, we’re adding the in-line caching pattern. A key difference between inline and look-aside caching patterns is what the application code does versus what the cache server does. In the look-aside pattern, the application requests data from the cache, and if the data is not cached, the application gets the data from the backing store and puts it into the cache for subsequent reads. All of this is managed via the application code. Look-aside caching is further explained in this blog post.

In the inline caching pattern, custom code is deployed into all of the cache servers, and then the cache takes control of reading-through from the backing database on cache miss, and writing either synchronously or asynchronously to the backing database. This custom code consists of cache loaders, cache writers, and listeners. This simplifies the task for developers because they do not need to write extensive code and use external transaction managers for keeping the cache and the backing store in sync.

Server Side Functions

Server side functions are a technique for accelerating data processing by moving the compute to the data instead of the data to the compute. This pattern crunches data on the servers in situ with the data housed in those servers. This approach eliminates the “network hop”  between the executing code and the data stored in the PCC cluster. (If this sounds familiar, it’s akin to the Map-Reduce implementation.)  In fact, external benchmarks have demonstrated that for relatively simple computations this capability can achieve as much as 76X faster execution than traditional patterns where data is moved to the compute.

In addition to Map-Reduce style functions, PCC also supports data aware function execution. In this case, the function is invoked with a filter (a list of keys) that informs the Function Execution Service which data elements will be involved in the computation and thereby which servers need to execute the function in order to have all of that data processed locally without any extra network hops. This is the most efficient method of data processing in many real-world situations.

Data Persistence

Persistence ensures that data in-memory is also stored on the persistent disk of the VM. The “write to the disk” (PCC’s local file system) is done synchronously.

Persistence provides the ability to restart the entire cluster (from the data in PCC’s local file system) without having to go back to the backing database to reload all of the data. If an entire PCC cluster fails, BOSH will recreate the VMs from the persistent disk. From there, PCC will load all the data from the disk into the cluster. The end result: the likelihood of data loss is minimized. The chances of data loss are limited by the short time it takes to flush each batch to the linux file system, and this window is even smaller with solid state disk. This in conjunction with PCC’s use of Availability Zones and redundant copies of data replicated throughout the cluster brings the likelihood of data loss down to as near zero as possible.

Moving Beyond the Simple Use Cases

This release takes us way beyond simple caching use cases. We’re pumped about this release because it squarely addresses customer and market requirements that we have come to understand in the course of our fifteen years of experience with our standalone caching products. We’re happy to bring these same capabilities to Pivotal Cloud Foundry.