GemFire, whatcha got?
My pair and I decided to spend a day learning about GemFire, one of Pivotal’s big data solutions. We wanted to brush up on some of the products and services Pivotal offers to large enterprise customers since they seemed like a giant amorphous blob to us.
So what is it exactly? Simple answer, GemFire is a distributed key-value data store. It allows you to easily scale servers to increase capacity, replicate your data to add redundancy and even persist data to disk if needed. It supports storing JSON documents and provides APIs for Java, C++ and C#, as well as a REST API via vFabric.
Building Blocks of a GemFire System
There are a few pieces that are needed to get GemFire up and running – locators, servers and regions.
- Locators act as the coordinators of GemFire. They keep track of servers in the distributed system and also do load balancing across servers.
- Servers (also called Peers) actually hold the data you plan on storing and querying. Generally, you’ll want to scale out a few servers to add redundancy and increase the capacity for your distributed system.
- Regions are name spaces for storing data that can be distributed across servers. You can define rules around persistence and data replication for each region.
A Deeper Dive Into Regions and Servers
Regions are one of the most important pieces to GemFire. When trying to store and query data, servers are abstracted and you only need to specify regions. You can query a region using a SQL-like command language like so
SELECT entry.value FROM /exampleRegion.entries entry WHERE entry.key='mykey'
Depending on the region’s type, data could be partitioned between servers or replicated across many. You can also setup a region to have its data persisted to disk. These details are hidden while querying and storing, but are important when determining how you want to setup your data store.
There are a few different region types available to you in GemFire. When a region is created, it is automatically added to every existing server in your cluster.
REPLICATE – If you make a region with a replicate type, a full copy of the region will be stored on every server. This allows for redundancy in your system since data is typically stored in memory. It allows for quick access (memory) but a level of redundancy since data is stored on multiple nodes in the cluster.
This example pulled from the GemFire documentation shows how data in a social network would be stored across 3 servers in a replicated region.
PARTITION – A partitioned region will split up data between the various servers in the system so that each server only stores a part of the region contents. You can configure the number of copies of data in your system.
Example of a partitioned region across 3 servers in a simple social network
PARTITION_PERSISTENT – A region that is similar to partition but each server stores its partition of the data set to local disk. This can be useful if you have more data than will fit in memory or if you want the added safety of persistence in case of a server failure.
Example of a partitioned and persisted region across 3 servers in a simple social network
Install it Yourself and Play With it!
We found the best way to learn was to just get GemFire set up and running locally for us to play around with. It’s pretty easy to setup if you are running OS X with Homebrew installed. If you have a different setup, refer to the official documentation for setup steps.
Installation on OS X with Homebrew
The easiest way to install it is to use Homebrew
$ brew tap pivotal/tap && brew install gemfire
GemFire needs a place to store logs and other data, make a directory and go to it
$ mkdir ~/workspace/gemfire
$ cd ~/workspace/gemfire/
Enter the GemFire shell
$ gfsh
Setup your first locator
gfsh> start locator --name=locator1
Start Pulse, GemFire’s monitoring tool
gfsh>start pulse
Log into Pulse (username: admin password: admin). You can keep checking back here to view your distributed system as you build it out.
Pulse dashboard showing locators and servers
Create a couple of servers to hold your data
gfsh> start server --name=server1 --server-port=40405
gfsh> start server --name=server2 --server-port=40406
Now we’ll create a few regions of different types
gfsh> create region --name=replicated --type=REPLICATE
gfsh> create region --name=partitioned --type=PARTITION
gfsh> create region --name=persisted --type=PARTITION_PERSISTENT
And we now have GemFire up and running! We can start by putting data into the different regions to see how it works
gfsh> put --key=best_rapper --value=Dylan --region=replicated
If you look in Pulse, you can see both servers have an entry in the replicated region now. You can also add data to the partitioned region and see that it is only stored on one server in the system.
Pulse displaying regions on a server
And to retrieve the data, you issue a SQL-like select query
gfsh> query --query="SELECT entry.value FROM /replicated.entries entry WHERE key='best_rapper'"
You should see the data stored in the previous step
That’s a brief overview of how to use the GemFire shell, feel free to read the documentation for more information and how to use the available APIs.
Also want to give credit/thanks to Eric Hu, Stephen Levine and Mariana Lenetis – we all spent the day discovering GemFire and how it worked.