apache_kafka kubernetes pivotal_container_service pks

Operationalizing Apache Kafka on Kubernetes: Pivotal and Confluent Team Up

Today, Confluent announced that Confluent Platform will natively support Kubernetes. We are excited to announce our collaboration with Confluent to make Pivotal Container Service (PKS) the best place to run the Confluent Operator. This joint effort includes not only container images of Confluent Platform, but also Helm charts and Kubernetes Operators to automate Day 1 and Day 2 tasks.

Why is this important?

Streaming solutions are essential parts of event-driven architectures, used by many cloud-native applications. Confluent Platform, based on Apache Kafka, is the leading enterprise distribution that companies depend on to capitalize on real-time data at scale. Pivotal and Confluent are working together on bringing enterprise-grade Apache Kafka to the Pivotal Cloud Foundry ecosystem.  

Event-driven architectures have emerged as a hot topic. Why? There are many factors, including connected devices and modern software architectures. Internet of things (IoT) use cases demand timely responses to machine data. Microservices architectures depend on messaging to coordinate across several services that own their own data. Serverless architectures use events to trigger function execution.

In other words, these new use cases and architectures are challenging our our existing data and messaging models. We need to consider how we process data coming from more sources. We need to allow new, unexpected data consumers—new functions and microservices—to be added without major rewrites. This is where stream processing and Apache Kafka comes into the cloud-native architecture.

Pivotal has been working with Apache Kafka for some time. Working with Confluent, we are expanding the options to run Kafka on Pivotal Cloud Foundry. Deploying and running Kafka on PCF inherits the operational benefits of BOSH. This post will highlight the different areas Kafka fits into and will fit into the Pivotal ecosystem.

Kafka in Spring Cloud Stream and Spring Cloud Data Flow

Spring Cloud Stream (SCSt) is a framework for building event-driven microservices. With SCSt, developers can build and test standalone cloud-native applications that can independently scale, fail gracefully, and change at different rates. The developer’s sole focus is in the business logic and the framework brings the messaging infrastructure to connect to Kafka automatically.  

SCSt includes a message binder abstraction. The Kafka and Kafka Streams binder implementations connect the Spring Boot applications into a coherent data pipeline. Each Boot app either ingests or processes the data published to a Kafka topic. The next Boot app in the stream consumes what it needs out of the topic. These pipelines are composed of Spring Boot apps each built to do one of  three things:

  1. ingest data from a source,
  2. process the data (e.g. filter, score, enrich, transform, etc.), and
  3. "sink" the data in a datastore of some kind.

Spring Cloud Data Flow (SCDF) simplifies the orchestration of streaming data pipelines. The pipelines are composed of Spring Cloud Stream or Spring Cloud Task microservices. Building upon Kafka Streams API, a data pipeline can be composed of KStreams microservices to compute stateful operations.

SCSt and SCDF are great for building streaming data pipelines. And PCF is the best place to run Spring Boot microservices, including those of SCSt and SCDF. In fact, we’ve introduced SCDF for PCF to make it even easier to automate the deployment experience in PCF. With these pipelines running on PCF, it makes sense to run the messaging infrastructure there, too.

Kafka in Project riff and Pivotal Function Service

In December 2017, Pivotal announced riff, an open source function service (FaaS) built on Kubernetes. Pivotal also announced plans to release Pivotal Function Service (PFS). PFS commercializes riff, making it easy to install, run, and maintain it on top of PKS.

Unlike other FaaS projects, riff functions connect to an event broker like Kafka via a sidecar. This ties every function to Kafka on input and output, saving developers from having to learn the Kafka API and set up that broker connection. In effect, riff builds upon Kafka topics as the assumed means for functions to publish and subscribe. Rather than creating a new, proprietary way to coordinate functions, riff uses Apache Kafka as the de facto standard for such loose coupling of functions.  

The Confluent Operator on Pivotal Container Service (PKS)

Earlier this year, Pivotal released PKS, allowing PCF users to run Kubernetes clusters. This is especially helpful when running containers from software publishers. After all, you're not going to refactor commercial software you bought. However, if you still have to *run* that software, with PKS, Kubernetes inherits operational benefits of BOSH. This includes refreshing the operating system under Kubernetes and simplifying the upgrade cycle of new Kubernetes releases.

A deployment option to power event-driven architectures

Between microservices, serverless, and IoT, more companies are adopting event-driven architectures. Gartner predicts that by 2020 80% of new business ecosystems will require support for event processing.  Pivotal's customers are adopting these patterns and use cases. Providing solutions through our ecosystem for deploying stream processing workloads simplifies adoption for our customers.