google products Tanzu RabbitMQ technical

RabbitMQ Hits One Million Messages Per Second on Google Compute Engine

 

featured-rabbit-millionGoogle Cloud Platform can rapidly provision and scale virtual infrastructure on which to build web and enterprise applications. Recently, Google and Pivotal engineers collaborated on a performance study of RabbitMQ running on Google Compute Engine. The RabbitMQ message broker was deployed atop Google Compute Engine where it demonstrated the ability to receive and deliver more than one million messages per second (a sustained combined ingress/egress of over two million messages per second).

To put this volume in context, one million messages per second translates to 86 billion messages per day. U.S. text messages reached 6 billion per day in 2012. Apple processes about 40 billion iMessages per day, and WhatsApp recently hit a new daily record in December when it sent 20 billion daily messages.

The joint RabbitMQ on Google Compute Engine performance test demonstrates how one of the world’s most widely adopted, open source message brokers can sustain a combined ingress/egress of over two million messages per second—a volume comparatively greater than the combined set of all U.S. text, Apple iMessages, and WhatsApp messages per day.

Technology Overview

Google Cloud Platform allows developers and IT operations teams to rapidly provision and scale virtual infrastructure on which to build web, mobile, and enterprise applications. RabbitMQ was designed to be easy to set up, deploy, integrate, and scale in public, private, and hybrid cloud applications. For example, RabbitMQ can connect a Google Compute Engine app with a geographically separated analytics platform, provide an Enterprise Service Bus model within Google Compute Engine, or help scale an on-premise micro-services architecture by connecting to additional Google Compute Engine nodes.

RabbitMQ also provides extensive options for integration with Google Compute Engine and other runtimes by supporting AMQP, HTTP, HTTPS, WebSocket, MQTT, and STOMP protocols as well as virtually all modern development languages including Java, Python, .NET, C/C++, Go, PHP, Ruby, Node.js, and others. Companies like Huffington Post, Instagram, Nokia, SoundCloud, the New York Times and thousands of others use RabbitMQ in a wide variety of messaging topologies and scenarios. According to some experts, it has become the leading middleware platform in financial services, the industry it was hatched from.

The Experiment

We provisioned a cluster of virtual machines, each with 8 vCPUs and 30GB of RAM in Google Compute Engine. Each virtual machine ran Debian with a stock installation of RabbitMQ. When deployed in a clustered configuration, multiple instances of RabbitMQ, known as nodes, collaborate to form a single, virtual messaging broker, which can be larger than could be run on a single host.

The RabbitMQ cluster was deployed across Google Compute Engine with just a few clicks from the web administration console. The resulting RabbitMQ cluster consisted of:

  • 30 RabbitMQ RAM nodes (where RabbitMQ broker metadata and definitions are held only in RAM)
  • 1 RabbitMQ Disc node (where RabbitMQ broker metadata and definitions are also held on disc, in this case, a Google Compute Engine Persistent Disk).
  • 1 RabbitMQ Stats node (to run the RabbitMQ management statistics database without any queues).

In front of the RabbitMQ cluster nodes sits a Google Compute Engine Load Balancer. The load balancer was configured with a target pool that included all nodes other than the dedicated stats node. The requests from the connecting AMQP clients were balanced between the nodes in the target pool. Excluding the stats node from the target pool helps ensure that message queueing and delivery work doesn’t end up up competing with the managements stats database for resources. In lower throughput scenarios users could choose to include the stats node in the load balancer target pool. This is very straightforward to do on Google Compute Engine.

The RabbitMQ PerfTest tool that ships as part of the RabbitMQ Java client library was used to generate load for the cluster. PerfTest was simultaneously run on a set of Google Compute Engine virtual machines disjoint from the ones housing the RabbitMQ cluster nodes.

For the test environment, 186 queues were created and spread across the cluster’s nodes. Almost 13,000 simultaneous connections were made from PerfTest clients, each of which was configured to initially publish messages on two threads, while consuming on eight threads.

After the traffic generating clients warmed up, we reached a steady state as shown in RabbitMQ’s web management console below:

Rabbit-GCEAt this high rate of combined ingress (1,345,531 messages per second) and egress (1,413,840 messages per second), RabbitMQ was keeping up with load and that only 2,343 messages are temporarily accumulated in its queues awaiting delivery.

Under multiple hours of such load, the RabbitMQ cluster and its underlying Google Compute Engine VMs remained stable, with average throughput staying in the ballpark pictured. In occasional cases where a single queue fell behind its egress rate and began to grow, additional PerfTest instances were started up and configured to consume only, just as an operator would do in practice on a production system.

Even under such loads, the individual Google Compute Engine VMs were sufficiently well provisioned and performant that the RabbitMQ nodes never experienced memory pressure, or needed to fall back on their resource-limit-based flow control mechanisms to maintain their health.

Lessons Learned

Many present day users of RabbitMQ enjoy excellent results with clusters made up of anywhere from three to seven RabbitMQ nodes. In light of this fact, the 30 node RabbitMQ cluster used in our experiments is rather large compared to what is commonly seen in current practice. Of course, the growth of big data, Internet of Things, real-time analysis, and other applications will likely increase the size of many RabbitMQ clusters in the future.

On such large clusters, there are a few key points to be kept in mind as systems are designed, built and scaled. Some hold true even for smaller clusters, while others become more important as our systems grow.

First, RabbitMQ’s fundamental unit of parallelism for message delivery work is the queue. Other than a partial exception that applies to certain high availability configurations, a RabbitMQ queue is backed by a single Erlang process (a lightweight thread abstraction). As such, a single queue can do no more work per unit time (primarily accepting and delivering messages) than it gets scheduled upon a CPU core to do. Thus, taking full advantage of a large cluster (in our case 31 nodes doing queuing work, with eight virtual CPUs on each, for a total of 248 cores) requires architects to consider how messages are routed within RabbitMQ and the design of the application’s so-called message fabric. While this situation exists on even small clusters, it becomes more important to consider explicitly when one is building a cluster to face truly massive load. By doing so, all potentially exploitable parallelism can be used.

While queues in RabbitMQ are each supported by a single, schedulable Erlang process, the work of publishers sending messages into exchanges is performed by an Erlang process associated with the channel used by the connected client. Thus, by being prudent in how we distribute our publishers, relative to the queues to which their messages are ultimately destined, we can work around the potential bottleneck posed by a single queue. The RabbitMQ Tutorials demonstrate the construction of message fabrics supporting a variety of scenarios and routing schemes. Our initial experiments with RabbitMQ on Google Compute Engine used very elementary examples. Beyond the scenarios covered in the tutorials, more exotic possibilities exist, including using using a consistent hash exchange type for dynamic load balancing scenarios. Thus, by using clustered RabbitMQ, and a message fabric built up from sensibly chosen publish / subscribe, routing and topic messaging patterns one can:

  • Absorb load by adding new queues
  • Load balance by adjusting the distribution of publishers and consumers
  • Create and maintain a scalable, distributed delivery system free of single points of failure.

Second, it is important to pay some attention to how the duties of individual nodes are apportioned in very large clusters with very high loads. The RabbitMQ management plugin, from which the performance screenshot above was captured, can be enabled or disabled on any node in a cluster. Its functions are several. It serves the web-based management UI depicted above, as well as a corresponding HTTP-based management API, and can also serve as the home for a statistics database to which performance metrics are reported by other cluster nodes. In a large cluster, where many nodes are reporting metrics, the statistics database in its current form can become a bottleneck. Thus, by default, Google Compute Engine will create RabbitMQ clusters with a single node reserved for the statistics database and segregate this load behind the load balancer in front of the cluster so that regular publishers and consumers aren’t sent directly to it. Such a segregation of the management and stats node is seldom done in smaller clusters that are handling less workloads.

Conclusion

Messaging-based middleware, as exemplified by Pivotal RabbitMQ, can be a key component of many web and enterprise distributed applications. Google Compute Engine allows one to quickly create performant and well provisioned RabbitMQ clusters, and run them at impressive scale, with greater ease than has previously been available to users. Using Google Compute Engine we were able to easily deploy the largest RabbitMQ clusters we have worked with to date, and push more than a million messages per second through them.

More about RabbitMQ