Home > Blogs > VMware vFabric Blog


Spring and RabbitMQ – Behind India’s 1.2 Billion Person Biometric Database

Aadhaar was conceived as a way to provide a unique, online, portable identity so that every single resident of India can access and benefit from government and private services. The Aadhaar project has received coverage from all possible media – television, press, articles, debates, and the Internet. It is Screen shot 2012-07-30 at 5.53.12 PM seen as audacious use of technology, albeit for a social cause. UIDAI, the authority responsible for issuing Aadhaar numbers, has published white-papers, data, and newsletters on progress of the initiative.A common question to the UIDAI technology team in conferences, events and over coffee is – what technologies power this important nation-wide initiative? In this blog post, we wanted to give a sense of several significant technologies and approaches.

Fundamental Principles

While the deployment footprint of the systems has grown from half-a-dozen machines to a few thousand CPU cores processing millions of Aadhaar related transactions, the fundamental principles have remained the same:

  • Simplicity in design & development, commodity hardware for deployment
  • Leverage best of breed solutions
  • Use of open standards and open source where prudent to avoid vendor lock-in

Categorizing Workloads

We made unbiased technology choices by precisely listing various workloads on the system and then mapping solutions best suited to process these workloads. The Aadhaar workloads were categorized into the following mutually distinct types:

  • Batch oriented, asynchronous tasks that may be parallel-processed.  Workloads are assigned using a scheduler. Transaction throughput and data-integrity are non-negotiable.
  • Synchronous, OLTP style API gateway. Workloads are user triggered. High availability and low latency data reads are critical needs.

Solutions to handle the above types of workloads needed to scale linearly and handle millions of transactions per day. A distinct characteristic of the Aadhaar systems with respect to resource utilization is that most transactions are I/O bound.

Principles and Patterns

Based on the fundamental principles above, we developed more concrete ones and adopted several architecture patterns:

  • Avoid container bloat and J2EE application server features that we don’t need. Custom-built J2SE based runtime that enables POJO based applications was used instead.
  • Use technologies that could help us distribute work across a number of nodes. SEDA was a natural fit with high-speed messaging providing the transport.
  • Ease of integration across the stack.
  • Use of Distributed File System to store and serve tera-bytes of biometric data. Adopt Data-Locality compute patterns to move compute closer to data.
  • Data Sharding as a technique to distribute data on both SQL and NoSQL data stores.

Processing, Messaging and Data storage nodes will fail. The system should support recovery and replaying of failed transactions using techniques like check-pointing execution state.

While this article only covers a few key technologies, the overall solution included:

  • Hadoop: HDFS, HBase, Hive, Pig, Zookeeper
  • MySQL: sharded, partitioned, distributed
  • SEDA: Mule, RabbitMQ
  • Search: MongoDB, sharded Solr
  • Compute Grid: Spring, GridGain
  • Monitoring: Custom built, Nagios
  • Analytics & Visualization
  • Deployment footprint : Thousands of CPU cores
  • Extensive Data archival, DR

Spring Application Runtime

All Aadhaar application runtimes were custom built using the Spring framework. We created various runtime profiles to suit the workload characteristics described above:

  • Basic profile – supports application bootstrapping, management, and loading application extensions defined as Spring Application Contexts.
  • Batch profile – extension of Basic profile using Spring Batch and enriched with administration and deployment capabilities.
  • Service profile – extension of Basic profile to support Service orientation, a registry of deployed services and invocation broker.
  • SEDA profile – extension of Service profile using Mule framework to support orchestration using RabbitMQ as messaging layer.

The application runtime makes extensive use of Spring framework modules namely, Core, AOP and JEE.

RabbitMQ Messaging

RabbitMQ was a perfect fit in the Aadhaar application runtime system for these reasons:

  • Low process footprint of the server (i.e. broker).
  • Great quality AMQP Java client libraries.
  • Ease of integration with rest of the stack – we wrote an AMQP transport for Mule that could be configured and managed using Spring.

P2P messaging in RabbitMQ helped us distribute work across the various SEDA runtimes. A single node RabbitMQ instance could easily scale to deliver millions of messages per day while exhibiting high degrees of system availability.

SQL and NoSQL Data Stores

The Aadhaar systems used a number of data stores that may be broadly classified as SQL and NoSQL. We adopted Data Sharding techniques to distribute data across clusters. We implemented a JPA-like persistence framework for this, used Spring DAO implementations for Transaction Management, and created Routing Data sources (each pointing to a data shard). Spring’s support for AOP and proxying was used extensively in building the persistence framework.

Monitoring

Managing a deployment footprint of a thousand plus CPU cores required extensive monitoring to maintain business SLAs. We implemented an agent-less, custom monitoring solution where applications emit metrics as Spring ApplicationEventS published to the runtime’s  Application Context. Metrics are aggregated using timers and published to the monitoring server using Spring Remoting Http endpoints. Metrics are cached in memory and also published to RabbitMQ queues that are then persisted to an RDBMS data store by the SEDA profile runtime.

See Related Slides from the Fifth Elephant Conference on Big Data

Screen shot 2012-07-30 at 5.57.45 PM About the Author: Regunath Balasubramanian is the Principal Architect of the Aadhaar project and works for MindTree. He has over 15 years of experience in technology consulting and implementation. He is passionate about using and contributing to Open Source. Regunath is presently part of the Flipkart CTO organization and working on the Customer Platform.
He was the Principal Architect of the Gov. of India UID project – the world’s largest identity database. Regunath blogs frequently and is an occasional guest columnist for CIOUpdate.com.

13 thoughts on “Spring and RabbitMQ – Behind India’s 1.2 Billion Person Biometric Database

  1. David Mytton

    Do you have any information about the kind of hardware infrastructure? You mentioned the software components and that i/o is important so it’d be interesting to understand how this was deployed. Did you use dedicated or virtualised instances? SSDs? What kind of capacity is used to handle that 5TB of replication traffic?

    Reply
  2. Regunath B

    All instances were physical on blade servers using Intel chips – standard ones from the likes of Dell, HP and IBM.
    Storage for the 5TB per day data was on FC disks. Data is moved to SATA disks after processing. All storage is managed off SAN(s). SSDs were used only for storing indexes, bin logs and the like.

    Reply
  3. biometric systems

    This is quite interesting , i need it hardware configuration for it application that i want know more about this , actually i want to go more one step a head as you mention in your blog

    Reply
  4. biometric systems

    This is quite interesting , i need it hardware configuration for it application that i want know more about this , actually i want to go more one step a head as you mention in your blog

    Reply
  5. biometric systems

    It’s good to hear that the US Government has increased their commitment to cleaner energy. Today people are using alternative energy such as sun and wind due to the benefits it gives to us. Excellent post, it’s worth reading.

    Reply
  6. biometric systems

    I’m still mastering from you, but I’m trying to obtain my objectives. I certainly appreciate looking at all that is placed on your website.Keep the details arriving. I beloved it.

    Reply
  7. Jeff Ostrin

    Regunath -

    Thanks for the article. I found this article while searching “Data-Locality compute patterns”. Do you have any more information about what patterns you used here?

    Thanks

    Reply
  8. the best free ipad apps

    MTS), even common Video files like AVI WMV MPEG that want to play Movie on i – Pad, you must need to.
    The second edition i – Pad, the i – Pad 2, got a more sleek
    design with some boosted hardware that it much faster.
    PPT files and hit the road, provided that changes aren’t needed.

    Here is my web blog … the best free ipad apps

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>