Home > Blogs > VMware vFabric Blog


Part 2: The Value, Architecture, & Code for Building Geography-Based Apps

In our last post, we 1) covered how geographic data can release value in mobile and machine-based applications, 2) explained how technology is used to overcome barriers to these types of big data scenarios, and 3) detailed the architecture for a data fabric or grid (like vFabric GemFire) that works with geographic data and specialized or alternative indexes. There were also code examples to explain the object model, the spatial index, and data changes.

Now, we will continue the examples, show you how to make the index highly available, and use a function to access the data via the index.

The Scenario for a Highly Available Index

In some cases, a piece of data may be added to a node, or become primary on a node without a clean method call. This happens in the cases of both failover and rebalancing. In the case of failover, a bucket that is on a node (that was also a redundant copy) may suddenly become the primary copy if the node that held the primary failed.

In the case of rebalancing, an entire bucket can be moved to a new node that was added to the system without the benefit of capturing the “put” call on each piece of data.

Updating the Index Based on Bucket Promotion and Removal

When buckets are being rebalanced or moved between servers, we need to make sure that we update the local index as well. By doing so, we don’t keep nodes in the index that no longer exist as a primary copy of the data on the node. To do this, we implement a PartitionListener which is called when Partition Events happen, such as after a bucket becomes primary, or after a bucket is removed from a node. This handles both 1) bucket promotion, when a rebalance or failure occurs, as well as 2) a bucket removal, which happens during a rebalance. To install the PartitionListener on a node, refer to the Region Configuration section in the prior post.

Not all of the APIs used are part of the javadocs for GemFire, but you can find the classes inside of your IDE once you include gemfire.jar on the classpath.

When the Region is created, we keep a reference to it to use when bucket events occur.  When a bucket event occurs, we are given the bucket id, which we can use to understand which keys are in a given partition.  Once we have the keyset, we can iterate over it to update the Spatial index.  When a bucket is removed from a server, the keyset is handed to us as part of the method call, since the bucket no longer exists on the node.  The keys will allow us to remove the data items that no longer exist on that server from the index.

Table 6: PartitionListener

public class QuadTreePartitionListener implements PartitionListener,
Declarable {private static PartitionedRegion region;
private static QuadTreeIndex tree;

public void afterRegionCreate(Region<?, ?> reg)
{
region = (PartitionedRegion)reg;
tree = QuadTreeIndex.getSingleton();
}

public void afterBucketRemoved(int bucketId, Iterable<?> arg1)

{
tree = QuadTreeIndex.getSingleton();
Iterator it = arg1.iterator();
while (it.hasNext())
{

SpatialKey key = (SpatialKey)it.next();
tree.remove(key.getLat(), key.getLon(), key);

}
}

public void afterPrimary(int bucketId)
{
tree = QuadTreeIndex.getSingleton();
Bucket b = region.getRegionAdvisor().getBucket(bucketId);
BucketRegion bucketRegion =

b.getBucketAdvisor()
.getProxyBucketRegion()
.getHostedBucketRegion();

Set <SpatialKey> keys = (Set <SpatialKey>)bucketRegion.keySet();
for (SpatialKey key:keys)
{

tree.put(key.getLat(), key.getLon(), key);
}
}
}

Accessing the Data

You can access the data in two ways:

  1. Via the Region API and OQL, allowing you to query the objects directly
  2. Via a function against the specialized Index

In order to access the data via the spatial index, you will need to create a function–the index only exists on the server side, and exists in portions on all servers, depending on the partitioning scheme. In order to get all data points associated with a given lat/long or a bounding box, we will call a function on all nodes that:

  1. Gets a reference to the local QuadTree
  2. Passes the lat/long parameters to the QuadTree to get a result (a set of SpatialKeys)
  3. Does a “get” on the region for each key returned in the QuadTree call
  4. Invisible to the client, the results from each node are aggregated and returned

These function calls happen in parallel on each server, operating only on the local data set, giving optimal performance.

When the function executes, the function can get access to only the local data set by calling getDataSet on the RegionFunctionContext, which is passed in to the Function execute method.

Table 7: Single Point Get Function Implementation

public class SinglePointGetFunction implements Function, Declarable
{
transient QuadTreeIndex tree = null;
public void execute(FunctionContext fc)
{

tree = QuadTreeIndex.getSingleton();
ArrayList<Float> args = (ArrayList<Float>)fc.getArguments();
float lat = (Float)args.get(0).floatValue();
float lon = (Float)args.get(1).floatValue();

SpatialKey key = tree.get(lat, lon);

Region <SpatialKey, MyData> theData = null;
if (fc instanceof RegionFunctionContext)
{

RegionFunctionContext rfc = (RegionFunctionContext)fc;
theData = rfc.getDataSet();

}

MyData dataPoint = theData.get(key);

fc.getResultSender().lastResult(dataPoint);

}
…//rest of the function implementation
}

If I call this function with an “onData” call, it will execute this function on all nodes that hold some portion of the region, collect the results from all nodes, and return to the client.  To install this function on the servers, add the following to the cache.xml file:

Table 8: Function in cache.xml

<cache>
<region name="MyData">
….
</region>

<function-service>
<function><class-name>
com.vmware.gemfire.example.spatial.function.SinglePointGetFunction
</class-name></function>
</function-service>
</cache>

See the com.vmware.gemfire.example.spatial.function classes for more examples.

To call the function from the client, the following example code can be used:

Table 9: Calling the function from the client

SinglePointGetFunction function = new SinglePointGetFunction();
FunctionService.registerFunction(function);

ResultCollector rc;

ArrayList args = new ArrayList();
args.add (new Float(1));
args.add (new Float(1));

Execution execution = FunctionService.onRegion(myData).withArgs(args);

rc = execution.execute(function);

Appendix

Example Contents Available on GitHub

>> To learn more about vFabric GemFire:

To view the code examples above, please visit my vFabric project on GitHub. After the example source is unzipped you can see the following files on your drive.

──SpatialIndex

├──data                   - directory that holds system data  for GemFire

├──gemfire.properties            – for the entire system

├──scripts           - scripts to start GemFire, edit to match your environment

├──lib               - place openmap.jar here

├─src             - Source code for the example

├─xml             - config for the example

At this time please edit the scripts/gf.config script to match the locations you installed the prerequisites. Here are the basic steps:

  1. Open the example project in your IDE, and review the code and configurations
  2. Set the project properties to point to the proper locations of openmap.jar and gemfire.jar.
  3. Make sure that the machines that will be participating have the hosts that will be running GemFire listed in the /etc/hosts file or equivalent, even if the example will be self contained on a single machine.
  4. From a command line, run scripts/startLocator.sh from the scripts directory
  5. Run scripts/startServer1.sh from the scripts directory.
  6. From your IDE, run SpatialExampleClient
  7. Run startServer2.sh
  8. Run SpatialExampleClient again – see that the results are the same after a rebalance
  9. Stop the system

Thank you for reading, any comments or questions are welcome! If this was helpful, we’d love to hear about it, and let us know what other types of articles you’d like to see.

About the Author:  Catherine Johnson is a strategist with VMware and works with customers who are Implementing Fast Data solutions. She has more than 15 years of experience in distributed systems and holds a Master’s degree in Computer Science focused in distributed systems design. Her grad school research focused on spatial data in distributed systems. Catherine has spent most of her career at Oracle and VMware, working across most organizations, including consulting, engineering, education, and pre-sales.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>