posted

0 Comments

Amazon Kinesis enables real-time processing of streaming data at massive scale. Kinesis Streams is useful for rapidly moving data off data producers and then continuously processing the data, be it to transform the data before emitting to a data store, run real-time metrics and analytics or derive more complex data streams for further processing.

How Kinesis Works

Kinesis

Benefits of Using Kinesis Data Streams

  • Enables ingesting, buffering and processing streams of data in real-time.
  • Facilitates architectural changes in transforming a system from API-driven to event-driven.
  • Place data into Kinesis Data Streams, which ensures durability and elasticity.
  • Implement multiple Kinesis Data Streams applications that can consume data from a stream so that multiple actions, like archiving and processing, can take place concurrently and independently.

Publishing and Consuming

  • To send data to the stream, configure producers using the Kinesis Streams PUT API operation or the Amazon Kinesis Producer Library (KPL). Learn more.
  • For custom processing and analyzing of real-time, streaming data, use the Kinesis Client Library (KCL).

Kinesis Client Library

The Kinesis Client Library (KCL) enables fault-tolerant consumption of data from streams and provides scaling support for Kinesis Data Streams applications. KCL is built on top of AWS Kinesis SDK instead of replacing it. For a simple application, AWS Kinesis SDK provides simple API to poll data from a stream continuously; it works relatively well if there is only one shard for the stream. In a multi-shard scenario, the client developer must deal with significant complexity:

  1. Connects to the stream
  2. Enumerates the shards
  3. Coordinates shard associations with other workers (if any)
  4. Instantiates a record processor for every shard it manages
  5. Pulls data records from the stream
  6. Pushes the records to the corresponding record processor
  7. Checkpoints processed records
  8. Balances shard-worker associations when the worker instance count changes
  9. Balances shard-worker associations when shards are split or merged

Therefore, in production, almost no one directly uses AWS Kinesis SDK. For better understanding on how Kinesis works and the complexity of Kinesis consumption—especially how KCL helps to solve the problem—please refer to this presentation by Marvin Theimer on AWS re:Invent (BDT311). KCL is the de facto standard of Kinesis Consumption. Unfortunately, KCL is a Java native implementation only and there is no Go native implementation for KCL.

In order to support other languages, AWS has implemented a Java based daemon called MultiLangDaemon that does all the heavy lifting. MultiLangDaemon itself is a Java KCL application which implements an internal record processor. The MultiLangDaemon spawns a sub-process, which in turn runs another record processor which can be written in any language. The MultiLangDaemon process and the record processor sub-process communicate with each other over STDIN and STDOUT using a defined protocol. There will be a one-to-one correspondence amongst record processors, child processes and shards. The major drawbacks of this approach are:

  • Java dependency
  • Inefficiency: two record processors, internal processor in Java and external processor in any other language

VMware-Go-KCL

VMware-Go-KCL brings the Go/Kubernetes community with Go language native implementation of KCL matching exactly the same API and functional spec of original Java KCL v2.0 without the resource overhead of installing Java-based MultiLangDaemon.

Interface:

Usage Example:

Reference and Tutorial

Because VMware-Go-KCL matches exactly the same interface and programming model from original Amazon KCL, the best place for getting reference and tutorial is from Amazon itself:

Support and Contact

Open source embodies a model for people to work together, building something greater than they can create on their own. For any project related issues and questions, please create an issue in VMware-Go-KCL repository.

Stay tuned to the Open Source Blog for more project deep-dives and follow us on Twitter (@vmwopensource).