Home > Blogs > VMware vFabric Blog


Advanced GemFire + Lucene + Spring Data for Text Searching

In this article, we’ll talk about how to integrate the Lucene text searching solution using Spring Data and GemFire to provide a flexible, parallel fast search engine. By combining the two independent products we can leverage each product to its fullest capability. The end result provides an elastic search capability with the in memory data speeds of a distributed cache platform and high availability.

Motivation—Why Combine These?

The motivation of the project was to provide an alternative search capability for GemFire while providing users a natural method to define searchable domain object attributes. Performance was also a key driver to ensure constant search performance irrespective of scale. The solution outlined below provides a baseline approach for developers to build upon.

Goals and Requirements

Learn More

Read the full story in the Lucene-GemFire Index Integration Guide.

Download Now:
Click Here

  • Advanced search capability using established search technology
  • Deterministic search performance irrespective of scale
  • Simple API to allow developers to define searchable attributes on domain classes
  • Abstract complexity to clean simple interfaces
  • Fault tolerant search process

Various business scenarios can take advantage of this integration to improve search use cases either by retrieval time, criteria flexibility, high availability searching, and more.

Example Business Cases

  • Fast online trading symbol or company lookups to narrow search results to smaller set of companies, similar to Google Finance, for user to select desired symbol.
  • Provide the ability to search through large document stores to retrieve content that is relevant to the user. For example, provide searchable legal document stores to retrieve similar case notes using document abstracts as the searchable material.
  • Provide location based business search services to find local businesses relative to own location. For example find local dry clearers for traveling sales person.

Architecture: Lucene + GemFire

The architecture builds Lucene search capability within GemFire cache data nodes. GemFire is a distributed caching and data management platform that is considered an in-memory data-grid. By embedding the Lucene search engine within each GemFire cache node JVM, the result is a distributed, powerful searchable store with all the functional benefits of GemFire.

Each GemFire cluster node manages its own independent set of Lucene region indexes within the same JVM process. This provides search isolation and elasticity across the GemFire cluster.

The figure 1 presents a high level view of a single GemFire cache node.  Each cache node has one or many data regions holding key/value objects. Each region will have its own Lucene index repository.  The index can be set to be highly available, using a configuration described later in this article.

Features

  • All user data is stored in GemFire
  • Each region has its own index repository.
  • Spring Index repositories contain Lucene search documents
  • Index repositories and data reside within same JVM process
  • Index repositories can be configured to be highly available
  • User domain objects are decorated with @Searchable annotation
  • Client interfaces provided for searching and batch operations

Region Indexing

At the GemFire data region level, a Lucene index is applied using a Spring configuration. The configuration creates a dedicated index repository for the specified region. Along with this setting a cache listener, LuceneIndexWriter, is configured to capture all region events to manage the index repository.

For example, the CacheListener (see GemFire developer guide for details on cache event processing) receives an event on a region update and updates the index repository synchronously. This process is suitable if data updates are infrequent. Figure 2 represents this structure where the red intersection is the LuceneIndexWriter cache listener. The index itself is not held in a region but rather in an instance of a Spring-Data specialized repository.

For high volume data loads, a GemFire function is used to batch updates into manageable Lucene index transactions and reduce update latency. This is the preferred method when the GemFire cluster is being primed or for intra-day batch update processing. This technique can be applied for high frequency updates via asynchronous processes too.

An important point to mention is index operations are performed synchronously and occur after the cache insert. This can be changed by switching to a CacheWriter where the index update will occur before the cache update, not implemented. This implementation strategy is left to the user to decide at design time.

Index Redundancy

Each region index can be set to high availability. In the event of a failure, an index is rebuilt within the redundant cache node, providing a region has been set to have a redundancy (see GemFire developer guide). For partitioned regions, a GemFire PartitionListener implementation is used to determine if a node has become primary and would then build the index based on its own local data set.

About the Author:  Lyndon Adams is a Data Strategist from the VMware Field Center of Excellence. He specializes in fast data and big data systems. Lyndon over 15+ years of IT experience, mostly from the investment banking world. During his period in Investment Banking he was involved in margin, risk, trading, tick and reporting systems.

9 thoughts on “Advanced GemFire + Lucene + Spring Data for Text Searching

  1. shreedinesh

    Hi Lyndon.. nice to see the full text search solution In-memory. Interesting to explore more :-)

    Went through the article and what i have understood is this that you are tyring to create lucene index inside the gemfire node( inside RAM) and then one can fire the lucene query on it.

    However Lucene also provides similar feature to create index in RAM and execute the query.. Do you want to say integrating with Gemfire it will give better search performance, could you highlight more advantage we get other then full text base search capabilities..

    Another point on parallel fast search engine, is that mean the lucence index document will be distributed and parallel search will get performed.. Do we deal with Lucene query or OQL gemfire query

    Dinesh

    Reply
    1. Lyndon Adams

      Hi Dinesh,

      Thanks for your comments and questions.

      My driver of this project was to show how you can integrate 3rd party products in to GemFire and demonstrate Spring-Data, GemFire and Lucene all working together.

      Now integrating GemFire and Lucene together provides a distributed search process using GemFire technology. The search process is driven from the Lucene search syntax rather than GemFire OQL.

      Regards

      Reply
      1. shreedinesh

        Sorry Lyndon, I havent got any notification of email on update to this post. Read the reply today..

        Thanks for reply..

        However is there any way i could get some understanding on integration of GemFire and Lucene. I tried using compass the same but the results are not good.. The response time of the query was high as compared to execution with only Lucene.

        Just trying to evaluate whether the approach has any loop hole..

        Reply
  2. YC Liu

    Lyndon,
    Actually, we implemented Gemfire + Lucene for text searching in our project a year ago. It works quite well…the performance has 30%-40% improvement…Thanks for your sharing…The performance use OQL is slow..

    Reply
  3. zhang,wei

    Hi Lyndon Adams,

    Could you please provide the API we can use about this article, above is my email, could you please share with me, our gemfire project need this function urgently.

    Regard
    Wei

    Reply
  4. Manish

    The index itself is not held in a region but rather in an instance of a Spring-Data specialized repository.

    I was trying to understand above sentence. Does it mean the index is held on disk?Do you use Lucene’s MMapDirectory to store index?

    Reply
  5. Barbara

    What is the difference between VMware Gemfire and Pivotal Gemfire?
    What is the relationship between VMware and Pivotal and will VMware continue to produce a ‘gemfire’ application or is it a Pivotal product now?

    What’s the support model?

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>