Home > Blogs > VMware vFabric Blog


Map-Reduce Style: Data Aware Distributed Querying in vFabric GemFire

vFabric GemFire ArchitectureBig, fast data is powering some of the most interesting computing opportunities in today’s market. But in order to get there, we need to change our approach to the data tier. Enterprises are trying to move from costly mainframe architectures to virtualized datacenters and utilize commodity hardware more efficiently. With the data tier, this means an architecture that scales horizontally by adding more commodity-based computing and storage at runtime.

To scale the data tier horizontally, companies use systems like vFabric GemFire, a distributed data system that is designed to specifically accommodate large data sets across commodity hardware nodes. In GemFire, data is spread across members of a cluster with members referred to as “nodes,” and the distribution of data across those nodes is called “partitioning.” vFabric GemFire then allows developers to query the data that resides across many nodes while retaining core values of very high performance at scale. How? In short, the answer is “Data Aware Querying” – a query API that allows a query to execute on selective nodes instead of all nodes (i.e. execute in a map-reduce style).

To answer this question, this article covers the following:

  • Understanding Data Partitioning
  • Understanding Basic Data Querying
  • Using Custom Partitioning to Achieve Data-Aware Querying
  • Implementing Function Execution with Custom Partitioning

Understanding Data Partitioning

First, we should understand how data is mapped out in order to understand how we can store and access a lot of it quickly and in dynamic ways.

GemFire partitions data using keys, and, hence, a subset of keys and corresponding values are stored on a single node. This approach facilitates concurrent access to a large data set with very high throughput without affecting store/access latency in a cluster of nodes. A key is unique entity making store/access an O(1) operation (i.e. something that always takes the same amount of time and does not depend on the size of the input) and allows for the storage of duplicate values. Also, a key can either be an independent entity like a sequence number or be composed of references to multiple attributes in a value, letting the partitioning to be based on a composite key.

Partitioning data can increase query performance because it uses a partial scan of a large data set and avoids using a full data store scan or multiple random reads scattered across the whole data store.

Within GemFire, data is partitioned using a PartitionRegion. One partition, or node, consists of multiple buckets that are configured at startup. Buckets are distributed across multiple nodes deterministically based on the key. To add one additional piece of background information on buckets, they are the smallest unit of data in a partitioned region that can be moved from one partition (JVM) to another during rebalancing.

Understanding Basic Data Querying

GemFire provides a modern way to query the distributed data. A query is executed in a scatter-gather fashion – starting from a coordinator node and gathering results from other concerned nodes to the coordinator, and, finally, providing results to the application. All nodes where query is executed are considered data nodes, and the first node, which starts a query (or receives it from a client), becomes the coordinator. This allows the query to run in parallel on concerned data nodes and gather results on the coordinator node for final processing. For example, the coordinator for an ORDER BY query performs the merge sort of ordered result-sets chunks.

Before we get more advanced, let’s begin with a basic example. Again, GemFire distributes the data using keys in a key-value pair. Querying this data involves the usage of a SQL-like query language known as Object Query Language, or OQL. Without using any special partitioning in GemFire (as discussed later), keys end up having no relation with value. OQL queries are executed on values without specifying the distribution of data across nodes (i.e. the scatter phase). Without specifying distribution, all nodes must be queried. This is both inefficient and expensive to do across the network.

For an example, let’s say we have a Passenger object with a Flight field.

Passenger {

String name,
Date travelDate,
int age,
Flight flt,

}
Flight {

int flightId,
String origin,
String dest,

}

Let’s say 100 million Passenger objects are stored in the “Travelers” datastore (i.e. a datastore is called “Region” in GemFire), which is partitioned across 3 nodes, and we want to run the following query for all Passengers in the Region.

"SELECT p.travelDate, p.age,

FROM /Travelers p, p.flt f

WHERE

f.origin IN ('Boston', 'Chicago')

AND f.dest = 'Seattle'

AND p.age < 35"

(Note: The above, example query can be used by an airline to determine what movies to offer on flights from Boston OR Chicago to Seattle based on Passenger age being under 35.  The idea is that young travelers or families with children will want more family oriented movies, while adults may want news or recent episodes of a popular television series.)

This query would basically create a full table scan of 100 million records, which is highly inefficient. While GemFire supports the creation of indexes, we are omitting the discussion of indexes here to clarify the improvement due to data-aware partitioning alone.

Using Custom Partitioning to Achieve Data-aware Querying

Logically, a query will be more efficient if it is targeted to a specific location. Custom-partitioning or fixed partitioning (also, sometimes referred to as “column-based partitioning” in relational database terms) is a GemFire option used to deterministically distribute data. In GemFire 6.6.2, we can query the distributed (i.e. partitioned) data based on a column in a selective manner.

Using the same example as above, all Passenger data is partitioned across multiple GemFire nodes. In the Passenger object, Flight has an “origin” field. Data can be partitioned to certain buckets (i.e. sections within a partition) based on the origin city if we make it part of the key. This means the routing of Passenger data to a particular node would be based on “origin” provided in the Flight field. So, within a partition, only certain buckets would be queried instead of many nodes, a single node, or a partition. So, there would NOT be 100 million Passenger objects to iterate over. With a data-aware querying set up, the above query is executed to a limited data set. If we assume there are 100 million passengers, 50% of Passengers are below 35, there are 9 total origination cities, and no indexes exist, GemFire’s query engine iterates over ~11 million [100 * (50%) * (2/9) =11.11] passenger objects.

To custom-partition the data an application developer has to implement the PartitionResolver to plug-in their partition strategy for GemFire. A PartitionResolver might look like as follows,

/**
* This resolver stores all Passengers based on their location in one bucket.
* The region is configured to contain 9 buckets.
*
*/
public class MyPartitionResolver implements PartitionResolver {

//Know no of buckets in the partition region which is configurable for a partition region.

//9 different locations possible.

Map keyToRoutingObject = new HashMap();
keyToRoutingObject.put("Seattle", 8);
keyToRoutingObject.put("Chicago", 4);

  @Override
public Serializable getRoutingObject(EntryOperation opDetails) {
- - - - - -
//opDetails.getKey() returns key, used in region.put(key, value);
return keyToRoutingObject.get(opDetails.getKey().getOrigin()); //Could be "seq_num+origin"
}
}

All Passengers having same origin in Flight field will be routed to the same bucket on the same node as shown in the diagram below.

GemFire Function Execution Service

Implementing Function Execution with Custom Partitioning

GemFire’s Function Execution Service can then be used on this partitioned data to achieve a map-reduce way of operating on distributed data and query data where it is located. This is known as data-aware querying. The Function Execution Service task can be executed on a particular node or set of nodes. Functions are dropped on filtered nodes (in above diagram, Partition B for “Chicago” and Partition C for “Seattle”) and execute the code locally on each node. Query execution also happens ONLY locally using the new API. No remote or distributed querying is performed on a node. The difference between querying without function context and with function context is that in former case, the query runs all local buckets but in later it only runs on Buckets C and S.

To query through the new Query API inside a Function:

//EmpFunction
Class EmpFunction extends FunctionAdapter {

- - - - -
void execute(FunctionContext context) {

- - - - -
Query query = new Query(context.getArguments());
SelectResults results = query.execute(context); //New API
- - - - -

}
- - - - -

}

Execute above Function as follows in your Application code to execute the query:

// Application Client code.

Set filter = new HashSet();
filter.add("Seattle");
filter.add("Chicago");

Function empFunc = new EmpFunction("NAZFunction");
//Execute Function
ResultCollector rColl = FunctionService

.onRegion(getRegion("employee"))
.withArgs(query)
.withFilter(filter)

.execute(empFunc);

//Get Results
Object result = rColl.getResults();
SelectResults queryResults = getResults(result);

This approach provides a sophisticated way to effectively query distributed data while retaining very predictable performance.

About the Author: Shobhit Agarwal is a member of VMware’s Technical Staff, working on high-availability, low-latency, in-memory data management systems for virtual environments for the past two years. Agarwal graduated from Northeastern University with a MS in Computer Science specializing in Systems Engineering. His specialties include java development, distributed systems and data structures.

120 thoughts on “Map-Reduce Style: Data Aware Distributed Querying in vFabric GemFire

  1. Pingback: VMware vFabric Blog: 3 Game Changing Capabilities in SQLFire | Strategic HR

  2. Pingback: Understanding Speed and Scale Strategies for Big Data Grids and In-Memory Colocation | VMware vFabric Blog - VMware Blogs

  3. commercial moving company Chicago

    I write a leave a response whenever I like a article on a website or
    if I have something to add to the discussion. It is caused by the fire displayed in the post I looked at.

    And on this article Map-Reduce Style: Data Aware Distributed Querying in vFabric GemFire |
    VMware vFabric Blog – VMware Blogs. I was actually moved enough to post a thought 🙂 I do have a couple of questions for you if it’s okay. Could it be only me or do some of the comments appear as if they are coming from brain dead people? 😛 And, if you are posting at additional social sites, I’d like to follow anything new you have to post.
    Could you make a list the complete urls of your social pages like
    your Facebook page, twitter feed, or linkedin profile?

    my web page – commercial moving company Chicago

    Reply
  4. bulk mail provider

    Hi there, all is going sound here and ofcourse every one is sharing facts, that’s in fact excellent, keep up writing.

    Reply
  5. Sherryl

    Fantastic web site. Lots of helpful info here. I’m sending it to
    some friends ans additionally sharing in delicious.
    And of course, thanks to your effort!

    Reply
  6. tigaraelectronica

    Thanks for finaally writing about > Map-Reduce Style: Data Aware Distributed Querying in vFabric GemFire | VMware vFabric Blog – VMware Blogs < Liked it!

    Reply
  7. ecigs

    Hi there, You’ve done an excellent job. I’ll definitely digg it and personally suggest to my friends.
    I’m confident they’ll be benefited from this web site.

    Also visit my web blog … ecigs

    Reply
  8. poussoirs-Ressort.fr

    poussoirs ressort
    You can certainly see your enthusiasm in the work you write.
    The world hopes for even more passionate writers such as you who are not afraid to
    mention how they believe. At all times go after your heart.

    Fabriquant de poussoirs à ressorts industriels

    Here is my web-site: interrupteur à bouton poussoir (poussoirs-Ressort.fr)

    Reply
  9. pulvérisation huile

    Howdy, i read your blog from time to time and i own a similar one and i was just wondering if
    you get a lot of spam comments? If so how do you protect against it, any plugin
    or anything you can suggest? I get so much lately it’s driving me mad so any help
    is very much appreciated.

    Here is my homepage: pulvérisation huile

    Reply
  10. Déménagement presse

    Fantastic site you have here but I was curious if you knew of any
    discussion boards that cover the same topics talked about here?
    I’d really love to be a part of online community where I can get responses
    from other knowledgeable individuals that share the same
    interest. If you have any suggestions, please let me know.
    Thank you!

    Here is my blog post – Déménagement presse

    Reply
  11. Banderoles Publicitaires

    faire son drapeau – Borney.fr – affiche impression
    – fabrication en france Achat drapeaux, fabricant de drapeaux français banderolles publicitaires
    My relatives all the time say that I am killing my time here at net, except I know I am getting experience everyday by
    reading such pleasant articles or reviews.

    Here is my website: Banderoles Publicitaires

    Reply
  12. système automatisé industriel

    It’s remarkable in favor of me to have a web site, which is beneficial designed for my knowledge.
    thanks admin

    my website: système automatisé industriel

    Reply
  13. http://www.Mandaley.fr/

    travel
    I have read several excellent stuff here. Definitely price bookmarking for revisiting.
    I wonder how a lot attempt you set to make one of these
    magnificent informative web site.

    Here is my webpage :: http://www.Mandaley.fr/

    Reply
  14. Derrick

    What’s Going down i’m new to this, I stumbled upon this I’ve discovered It
    positively useful and it has helped me out loads.
    I hope to contribute & assist other users like its aided me.

    Good job.

    Reply
  15. plan cul colombes

    Thanks for the good writeup. It if truth
    be told used to be a amusement account it. Glance complex to far introduced agreeable
    from you! However, how can we keep in touch?

    Reply
  16. mobile games

    I blog often and I seriously appreciate your information. This article has truly peaked my interest.
    I’m going to bookmark your site and keep checking for new information about once a
    week. I opted in for your RSS feed as well.

    Reply
  17. http://www.allungareilpene.hotfacts.eu

    Greetings! Very useful advice within this article!
    It’s the little changes that produce the biggest changes.
    Thanks a lot for sharing!

    Reply
  18. badge personnalis竢adge personnalises

    I’ll right away take hold of your rss feed as I
    can’t find your e-mail subscription link or newsletter service.
    Do you have any? Kindly permit me realize so that I may subscribe.
    Thanks.

    Reply
  19. خبرهلی بین المللی

    very nice

    Reply
  20. خبرهاي روز دنيا

    good

    Reply
  21. خبرهاي ورزشي

    very well post

    Reply
  22. حوادث

    follow post

    Reply
  23. dapna

    My programmer is trying to convince me to move to .net from PHP.
    I have always disliked the idea because of the expenses. But he’s
    tryiong none the less. I’ve been using Movable-type on numerous websites for about a
    year and am nervous about switching to another platform. I have heard good things about blogengine.net.
    Is there a way I can import all my wordpress content into it?
    Any help would be really appreciated!

    Reply
  24. importador express

    for sharing

    Reply
  25. ninite

    I liked the very good article

    Reply
  26. active directory

    Very useful article congratulations

    Reply
  27. Curso de Porcelanato Líquido

    My congratulations, an excellent article. Thank you for sharing with us!

    Reply
  28. Supere a Ansiedade

    Good article! Very nice!

    Reply
  29. Curso de Cavaquinho 2.0 com Dudu Nobre

    Nice Article

    Reply
  30. jhon

    Very nice! Good article!

    Reply
  31. PARE

    VERY GOOD ITEM!

    Reply
  32. Azenka

    Really Great Article, Keep Updating your site and just bookmark it.

    Thanks a lot;)

    Reply
  33. Carla Beatriz

    A LITTLE FULL, BUT WAS THE READING

    Reply
  34. andrer

    Thank you very much for the information of this site, I really liked it.

    Reply
  35. gabriel

    What an incredible article, congratulations, thank you very much for the information.

    Reply
  36. vencendo a insônia

    tank you

    Reply
  37. Arshiya

    Hi Thank You .
    http://arshiyagroup.ir

    Reply
  38. José

    I am very pleased with the information in this article, note 10!
    Good article! Very nice!

    Reply
  39. Vencendo a azia

    I’ve been using vmware for quite some time now. Some of the best of my opni…

    Reply
  40. Marco Paulo de Morais

    Thank you for sharing this great article

    Reply
  41. Aline Queiroz

    Eiiita que tem comentario hahah

    Reply
  42. طراحی سایت

    Though withdrawal from your lover can be as powerful as withdrawal from cocaine, there are ways to deal with heart-breakers other than by kissing your existence goodbye:

    Reply
  43. Magda Saúde e Bem Estar

    Fantastic web site. Lots of helpful info here. Thank you for sharing this content.

    Reply
  44. Supere a Ansiedade

    My congratulations, an excellent article. Thank you for sharing with us!

    My congratulations, Good article! Very nice!

    Reply
  45. OWL Mídia

    Acesse o meu site:
    http://www.owlmidia.com.br

    Reply
  46. daniela dos santos

    I am very pleased with the information in this article,
    Congratulations, you’ve made a great article.

    Reply
  47. Beatriz diniz

    My congratulations you did a great job.

    Reply
  48. gabriel

    Very good this site, congratulations

    Reply
  49. Joao Vitor

    Thank for sharing, very good!

    Reply
  50. Claudio Varques

    Fantastic web site.

    Reply
  51. Como cuidar de orquideas

    Thank you very much for the information. This article really helped me a lot. 😀

    Reply
  52. emagrecer urgente

    Very good text. Thank you for the informations

    Your design was very good, congratulations

    Reply
  53. آهنگ ایرانی

    آهنگ ایرانی

    Reply
  54. Ana Reborn

    Very good, this article lots of quality of the best

    Reply
  55. Joao Vitor

    Very interesting! Congratulations on the excellent work

    Reply
  56. matrizes de bordados

    I really enjoyed your tips. Thanks for sharing

    Reply
  57. Fidget cube

    I blog often and I seriously appreciate your information.

    Reply
  58. Rosangela

    Thank for sharing this excellent post!

    Reply
  59. John

    I’ve seen a lot of good stuff here, your site is very informative.

    Very Nice

    Reply
  60. Americas Marketing

    Américas Marketing

    Reply
  61. Diabetes tem Cura

    Tratamento de 21 dias

    Reply
  62. Salgados Congelados

    ganhe dinheiro com salgados congelados

    Reply
  63. Cura Quantica

    A co-criação de uma nova realidade

    Reply
  64. Casamento em Crise

    Volte a ter um casamento de sucesso

    Reply
  65. Ganhar Massa Muscular

    Ganhe massa muscular 4x mais rápido

    Reply
  66. Detox para emagrecer

    Emagreça de forma saudável

    Reply
  67. Pronta para o Romance

    Pronta para se apaixonar

    Reply
  68. Como Montar um Salão de Beleza

    Salão de beleza de sucesso

    Reply
  69. Como Aumentar os Seios

    Aumente os seios naturalmente

    Reply
  70. Despertar da Consciência

    O despertar do EU Superior

    Reply
  71. Aula de Violão para Iniciantes

    Curso de violão online

    Reply
  72. Disfunção Erétil

    O fim da impotência masculina

    Reply
  73. Brigadeiro para Vender

    Ganhe dinheiro com brigadeiros

    Reply
  74. Manutenção de Celular

    I have read several excellent stuff here. Definitely price bookmarking for revisiting.
    I wonder how a lot attempt you set to make one of these

    Reply
  75. Como acabar com zumbido no ouvido

    Great

    Reply
  76. Paulo Ferreira

    Americas Marketing

    Reply
  77. Genésio Lima

    Saiba como passar na OAB

    Reply
  78. Gertrudes

    Musical marketing

    Reply
  79. Gertrudes

    Games marketing

    Reply
  80. Gertrudes

    Tecnologia marketing

    Reply
  81. Gertrudes

    Fotografia

    Reply
  82. Como vender hinode

    Aprenda como Vender Hinode

    Reply
  83. Luma

    Plano de Saúde Para Cachorro
    plano de saúde para cachorro

    Reply
  84. Como cuidar das orquídeas

    Great article, I will help a lot.

    Reply
  85. Auto Eletrica Unitec

    Obrigado por compartilhar esta excelente postagem!

    Reply
  86. Helenice Honorio

    Meus parabéns, um excelente artigo. Obrigado por compartilhar conosco!

    Reply
  87. Fabrica de laços e tiaras

    Very useful article congratulations

    Reply
  88. como fazer sapatinho de bebe

    Fantastic web site. Lots of helpful info here. I’m sending it to
    some friends ans additionally sharing in delicious.

    Thanks

    Reply
  89. Ralf

    Virtualization is of great importance to the branch of Ti.Belo article.

    Reply
  90. Como Fazer Sabonete

    This article is very cool, very relevant content.

    Reply
  91. Cristiano Lopes

    Great article. As it is very complex I am going back to learn to apply in my projects. Strong hug

    Reply
  92. مهر برجسته

    This article is very cool, very relevant content.
    http://fahamand.com/

    Reply
  93. thais

    This article is very cool, very good

    Reply
  94. felipe barcellos

    Very useful article congratulations, very good.

    Reply
  95. Luna

    Obrigado ajudou muito.

    Reply
  96. orquidea

    I loved the post and I’m going to share it with my team at Thursday’s meeting. I believe we are going to correct some important errors.

    Reply
  97. Pet Shop Em Porto Velho

    Thank you for sharing this information!!!

    You have a lot of useful and important information.

    Gratitude!

    Reply
  98. Storage Site no RJ

    gostei do post

    Reply
  99. site de moda infantil

    Post muito útil

    Reply
  100. QUEBRAS CONFIA

    BORA EMAGRECER COM QUALIDADE E DEFINIÇÃO? COM QUEBRAS VOCE TEM RESULTADO!

    Reply
  101. Nei de Souza

    Yes, fantastic !! Thank you for sharing this information.

    Reply
  102. Sara Campos

    Muito legal as informações, irei compartilhar em minhas redes sociais.

    Reply
  103. desempenho

    MACHO MAN AUMENTA O DESEMPENHO CONFIRA

    Reply
  104. آب شیرین کن

    Thank you for sharing this article!

    Reply
  105. خواص روغن خراطین

    خواص روغن خراطین اصل چیست ؟
    خواص روغن خراطین اصل با توجه به مواد غنی همچون پروتئین ، الانتئین ، امگا ۶ ، امگا ۹ و امگا ۳ بی شمار است.
    ولی در این بخش به مهمترین خواص روغن خراطین اصل اشاره می نماییم و سپس توضیحات کاملی ارائه می نماییم .

    Reply
  106. Como parar de fumar naturalmente

    very complicate to me

    Reply
  107. como falar ingles fluente

    i love this article. Congratulations

    Reply
  108. pilares do canto

    I don´t understand nothing. HELP ME

    Reply
  109. piscinologo

    Nothing anu matter, anu one can see. Nothing really matter too me

    Reply
  110. Melissa

    como fazer reeducação alimentar

    Reply
  111. como fazer reeducação alimentar

    Muito bom

    Reply
  112. Candidíase Como Curar

    sim eu gostei do post vou salvar em favoritos

    Reply
  113. Ed

    Muito interessasnte esse conteúdo valeu

    Reply
  114. روغن خراطین

    روغن خراطین و خواص روغن خراطین
    خراطین اصل

    Reply
  115. روغن خراطین

    همه چیز درباره روغن خراطین اصل
    روغن خراطین اصل

    Reply
  116. Doces para vender

    Parabéns pelo artigo, informaćões de grande importancia.
    Obrigada.

    Reply
  117. Mark

    Obrigado pelas informações!

    Reply
  118. danilo smith

    very good, thank you for nice post!

    Reply
  119. danilo silva

    Congratulations for nice post!!!

    Reply
  120. danilo silva

    very good informations

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

*