Aadhaar was conceived as a way to provide a unique, online, portable identity so that every single resident of India can access and benefit from government and private services. The Aadhaar project has received coverage from all possible media – television, press, articles, debates, and the Internet. It is seen as audacious use of technology, albeit for a social cause. UIDAI, the authority responsible for issuing Aadhaar numbers, has published white-papers, data, and newsletters on progress of the initiative.A common question to the UIDAI technology team in conferences, events and over coffee is – what technologies power this important nation-wide initiative? In this blog post, we wanted to give a sense of several significant technologies and approaches.
While the deployment footprint of the systems has grown from half-a-dozen machines to a few thousand CPU cores processing millions of Aadhaar related transactions, the fundamental principles have remained the same:
- Simplicity in design & development, commodity hardware for deployment
- Leverage best of breed solutions
- Use of open standards and open source where prudent to avoid vendor lock-in
We made unbiased technology choices by precisely listing various workloads on the system and then mapping solutions best suited to process these workloads. The Aadhaar workloads were categorized into the following mutually distinct types:
- Batch oriented, asynchronous tasks that may be parallel-processed. Workloads are assigned using a scheduler. Transaction throughput and data-integrity are non-negotiable.
- Synchronous, OLTP style API gateway. Workloads are user triggered. High availability and low latency data reads are critical needs.
Solutions to handle the above types of workloads needed to scale linearly and handle millions of transactions per day. A distinct characteristic of the Aadhaar systems with respect to resource utilization is that most transactions are I/O bound.
Principles and Patterns
Based on the fundamental principles above, we developed more concrete ones and adopted several architecture patterns:
- Avoid container bloat and J2EE application server features that we don’t need. Custom-built J2SE based runtime that enables POJO based applications was used instead.
- Use technologies that could help us distribute work across a number of nodes. SEDA was a natural fit with high-speed messaging providing the transport.
- Ease of integration across the stack.
- Use of Distributed File System to store and serve tera-bytes of biometric data. Adopt Data-Locality compute patterns to move compute closer to data.
- Data Sharding as a technique to distribute data on both SQL and NoSQL data stores.
Processing, Messaging and Data storage nodes will fail. The system should support recovery and replaying of failed transactions using techniques like check-pointing execution state.
While this article only covers a few key technologies, the overall solution included:
- Hadoop: HDFS, HBase, Hive, Pig, Zookeeper
- MySQL: sharded, partitioned, distributed
- SEDA: Mule, RabbitMQ
- Search: MongoDB, sharded Solr
- Compute Grid: Spring, GridGain
- Monitoring: Custom built, Nagios
- Analytics & Visualization
- Deployment footprint : Thousands of CPU cores
- Extensive Data archival, DR
Spring Application Runtime
All Aadhaar application runtimes were custom built using the Spring framework. We created various runtime profiles to suit the workload characteristics described above:
- Basic profile – supports application bootstrapping, management, and loading application extensions defined as Spring Application Contexts.
- Batch profile – extension of Basic profile using Spring Batch and enriched with administration and deployment capabilities.
- Service profile – extension of Basic profile to support Service orientation, a registry of deployed services and invocation broker.
- SEDA profile – extension of Service profile using Mule framework to support orchestration using RabbitMQ as messaging layer.
The application runtime makes extensive use of Spring framework modules namely, Core, AOP and JEE.
RabbitMQ was a perfect fit in the Aadhaar application runtime system for these reasons:
- Low process footprint of the server (i.e. broker).
- Great quality AMQP Java client libraries.
- Ease of integration with rest of the stack – we wrote an AMQP transport for Mule that could be configured and managed using Spring.
P2P messaging in RabbitMQ helped us distribute work across the various SEDA runtimes. A single node RabbitMQ instance could easily scale to deliver millions of messages per day while exhibiting high degrees of system availability.
SQL and NoSQL Data Stores
The Aadhaar systems used a number of data stores that may be broadly classified as SQL and NoSQL. We adopted Data Sharding techniques to distribute data across clusters. We implemented a JPA-like persistence framework for this, used Spring DAO implementations for Transaction Management, and created Routing Data sources (each pointing to a data shard). Spring’s support for AOP and proxying was used extensively in building the persistence framework.
Managing a deployment footprint of a thousand plus CPU cores required extensive monitoring to maintain business SLAs. We implemented an agent-less, custom monitoring solution where applications emit metrics as Spring ApplicationEventS published to the runtime’s Application Context. Metrics are aggregated using timers and published to the monitoring server using Spring Remoting Http endpoints. Metrics are cached in memory and also published to RabbitMQ queues that are then persisted to an RDBMS data store by the SEDA profile runtime.