This blog post demonstrates that native AMQP 1.0 in RabbitMQ 4.0 provides significant performance and scalability improvements compared to AMQP 1.0 in RabbitMQ 3.13.
Additionally, this blog post suggests that AMQP 1.0 can perform slightly better than AMQP 0.9.1 in RabbitMQ 4.0.
Setup
The following setup applies to all benchmarks in this blog post:
- Intel NUC 11
- 8 CPU cores
- 32 GB RAM
- Ubuntu 22.04
- Single node RabbitMQ server
- Server runs with (only) 3 scheduler threads (set via runtime flags as
+S 3
) - Erlang/OTP 27.0.1
- Clients and server run on the same box
We use the latest RabbitMQ versions at the time of writing:
The following advanced.config is applied:
[
{rabbit, [
{loopback_users, []}
]},
{rabbitmq_management_agent, [
{disable_metrics_collector, true}
]}
].
Metrics collection is disabled in the rabbitmq_management_agent
plugin.
For production environments, Prometheus is the recommended option.
RabbitMQ server is started as follows:
make run-broker
TEST_TMPDIR="$HOME/scratch/rabbit/test"
RABBITMQ_CONFIG_FILE="$HOME/scratch/rabbit/advanced.config"
PLUGINS="rabbitmq_prometheus rabbitmq_management rabbitmq_amqp1_0"
RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS="+S 3"
The rabbitmq_amqp1_0
plugin is a no-op plugin in RabbitMQ 4.0.
The AMQP 1.0 benchmarks run quiver in a Docker container:
$ docker run -it --rm --add-host host.docker.internal:host-gateway ssorj/quiver:latest
bash-5.1# quiver --version
quiver 0.4.0-SNAPSHOT
Classic Queues
This section benchmarks classic queues.
We declare a classic queue called my-classic-queue
:
deps/rabbitmq_management/bin/rabbitmqadmin declare queue
name=my-classic-queue queue_type=classic durable=true
AMQP 1.0 in 4.0
The client sends and receives 1 million messages.
Each message contains a payload of 12 bytes.
The receiver repeatedly tops up 200 link credits at a time.
# quiver //host.docker.internal//queues/my-classic-queue
--durable --count 1m --duration 10m --body-size 12 --credit 200
RESULTS
Count ............................................. 1,000,000 messages
Duration ............................................... 10.1 seconds
Sender rate .......................................... 99,413 messages/s
Receiver rate ........................................ 99,423 messages/s
End-to-end rate ...................................... 99,413 messages/s
Latencies by percentile:
0% ........ 0 ms 90.00% ........ 1 ms
25% ........ 1 ms 99.00% ........ 2 ms
50% ........ 1 ms 99.90% ........ 2 ms
100% ........ 9 ms 99.99% ........ 9 ms
AMQP 1.0 in 3.13
# quiver //host.docker.internal//amq/queue/my-classic-queue
--durable --count 1m --duration 10m --body-size 12 --credit 200
RESULTS
Count ............................................. 1,000,000 messages
Duration ............................................... 45.9 seconds
Sender rate .......................................... 43,264 messages/s
Receiver rate ........................................ 21,822 messages/s
End-to-end rate ...................................... 21,790 messages/s
Latencies by percentile:
0% ....... 67 ms 90.00% .... 24445 ms
25% .... 23056 ms 99.00% .... 24780 ms
50% .... 23433 ms 99.90% .... 24869 ms
100% .... 24873 ms 99.99% .... 24873 ms
The same benchmark against RabbitMQ 3.13 results in 4.5 times lower throughput.
Detailed test execution
---------------------- Sender ----------------------- --------------------- Receiver ---------------------- --------
Time [s] Count [m] Rate [m/s] CPU [%] RSS [M] Time [s] Count [m] Rate [m/s] CPU [%] RSS [M] Lat [ms]
----------------------------------------------------- ----------------------------------------------------- --------
2.1 130,814 65,342 8 79.1 2.1 3,509 1,753 1 7.5 777
4.1 206,588 37,849 6 79.1 4.1 5,995 1,242 0 7.5 2,458
6.1 294,650 43,987 6 79.1 6.1 9,505 1,753 1 7.5 5,066
8.1 360,184 32,734 5 79.4 8.1 13,893 2,194 0 7.5 6,190
10.1 458,486 49,102 6 79.4 10.1 15,793 950 1 7.5 9,259
12.1 524,020 32,734 5 79.4 12.1 21,644 2,923 1 7.5 11,163
14.1 622,322 49,102 5 79.4 14.1 25,154 1,753 1 7.5 13,451
16.1 687,856 32,734 4 79.4 16.1 27,639 1,241 1 7.5 15,246
18.1 786,158 49,102 6 81.0 18.1 30,124 1,241 1 7.5 17,649
20.1 884,460 49,102 6 81.0 20.1 32,610 1,242 1 7.5 19,408
22.1 949,994 32,734 4 81.0 22.1 35,535 1,462 0 7.5 21,293
24.1 999,912 24,934 4 81.8 24.1 38,167 1,315 1 7.5 23,321
26.1 999,974 31 2 0.0 26.1 117,745 39,749 11 7.5 24,475
- - - - - 28.1 202,589 42,380 11 7.5 24,364
- - - - - 30.1 292,554 44,938 13 7.5 24,244
- - - - - 32.1 377,691 42,526 15 7.5 23,955
- - - - - 34.1 469,704 45,961 14 7.5 23,660
- - - - - 36.1 555,719 42,965 12 7.5 23,463
- - - - - 38.1 649,048 46,618 12 7.5 23,264
- - - - - 40.1 737,696 44,280 15 7.5 23,140
- - - - - 42.1 826,491 44,353 15 7.5 23,100
- - - - - 44.1 917,187 45,303 16 7.5 23,066
- - - - - 46.1 999,974 41,394 14 0.0 22,781
AMQP 0.9.1 in 4.0
For our AMQP 0.9.1 benchmarks we use PerfTest.
We try to run a somewhat fair comparison of our previous AMQP 1.0 benchmark.
Since an AMQP 1.0 /queues/:queue target address sends to the default exchange, we also send to the default exchange via AMQP 0.9.1.
Since we used durable messages with AMQP 1.0, we set the persistent
flag in AMQP 0.9.1.
Since RabbitMQ settles with the released outcome when a message cannot be routed, we set the mandatory
flag in AMQP 0.9.1.
Since RabbitMQ 4.0 uses a default rabbit.max_link_credit
of 128 granting 128 more credits to the sending client when remaining credit falls below 0.5 * 128, we configure the AMQP 0.9.1 publisher to have at most 1.5 * 128 = 192 messages unconfirmed at a time.
Since we used 200 link credits in the previous run, we configure the AMQP 0.9.1 consumer with a prefetch of 200.
$ java -jar target/perf-test.jar
--predeclared --exchange amq.default
--routing-key my-classic-queue --queue my-classic-queue
--flag persistent --flag mandatory
--pmessages 1000000 --size 12 --confirm 192 --qos 200 --multi-ack-every 200
id: test-151706-485, sending rate avg: 88534 msg/s
id: test-151706-485, receiving rate avg: 88534 msg/s
id: test-151706-485, consumer latency min/median/75th/95th/99th 99/975/1320/1900/2799 µs
id: test-151706-485, confirm latency min/median/75th/95th/99th 193/1691/2113/2887/3358 µs
Summary
Quorum Queues
This section benchmarks quorum queues.
We declare a quorum queue called my-quorum-queue
:
deps/rabbitmq_management/bin/rabbitmqadmin declare queue
name=my-quorum-queue queue_type=quorum durable=true
Flow Control Configuration
For highest data safety, quorum queues fsync all Ra commands including:
- enqueue: sender enqueues a message
- settle: receiver accepts a message
- credit: receiver tops up link credit
Before a quorum queue confirms receipt of a message to the publisher, it ensures that any file modifications are flushed to disk, making the data safe even if the RabbitMQ node crashes shortly after.
The SSD of my Linux box is slow, taking 5-15 ms per fsync.
Since we want to compare AMQP protocol implementations without being bottlenecked by a cheap disk, the tests in this section increase flow control settings:
advanced.config
[
{rabbit, [
{loopback_users, []},
%% RabbitMQ internal flow control for AMQP 0.9.1
%% Default: {400, 200}
{credit_flow_default_credit, {5000, 2500}},
%% Maximum incoming-window of AMQP 1.0 session.
%% Default: 400
{max_incoming_window, 5000},
%% Maximum link-credit RabbitMQ grants to AMQP 1.0 sender.
%% Default: 128
{max_link_credit, 2000},
%% Maximum link-credit RabbitMQ AMQP 1.0 session grants to sending queue.
%% Default: 256
{max_queue_credit, 5000}
]},
{rabbitmq_management_agent, [
{disable_metrics_collector, true}
]}
].
This configuration allows more Ra commands to be batched before RabbitMQ calls fsync.
For production use cases, we recommend enterprise-grade high performance disks that fsync faster, in which case there is likely no need to increase flow control settings.
RabbitMQ flow control settings present a trade-off:
- Low values ensure stability in production.
- High values can result in higher performance for individual connections but may lead to higher memory spikes when many connections publish large messages concurrently.
RabbitMQ uses conservative flow control default settings to favour stability in production over winning performance benchmarks.
AMQP 1.0 in 4.0
# quiver //host.docker.internal//queues/my-quorum-queue
--durable --count 1m --duration 10m --body-size 12 --credit 5000
RESULTS
Count ............................................. 1,000,000 messages
Duration ............................................... 12.0 seconds
Sender rate .......................................... 83,459 messages/s
Receiver rate ........................................ 83,396 messages/s
End-to-end rate ...................................... 83,181 messages/s
Latencies by percentile:
0% ........ 9 ms 90.00% ....... 47 ms
25% ....... 27 ms 99.00% ....... 61 ms
50% ....... 35 ms 99.90% ....... 76 ms
100% ....... 81 ms 99.99% ....... 81 ms
Default Flow Control Settings
The previous benchmark calls fsync 1,244 times in the ra_log_wal module (that implements the Raft write-ahead log).
The same benchmark with default flow control settings calls fsync 15,493 times resulting in significantly lower throughput:
# quiver //host.docker.internal//queues/my-quorum-queue
--durable --count 1m --duration 10m --body-size 12 --credit 5000
RESULTS
Count ............................................. 1,000,000 messages
Duration .............................................. 100.2 seconds
Sender rate ........................................... 9,986 messages/s
Receiver rate ......................................... 9,987 messages/s
End-to-end rate ....................................... 9,983 messages/s
Latencies by percentile:
0% ....... 10 ms 90.00% ....... 24 ms
25% ....... 14 ms 99.00% ....... 30 ms
50% ....... 18 ms 99.90% ....... 38 ms
100% ....... 55 ms 99.99% ....... 47 ms
Each fsync took 5.9 ms on average.
(15,493 - 1,244) * 5.9 ms = 84 seconds
Therefore, this benchmark with default flow control settings is blocked for 84 seconds longer executing fsync
than the previous benchmark with increased flow control settings.
This shows how critical enterprise-grade high performance disks are to get the best results out of quorum queues.
For your production workloads, we recommend using disks with lower fsync
latency rather than tweaking
RabbitMQ flow control settings.
It’s worth noting that the Raft WAL log is shared by all quorum queue replicas on a given RabbitMQ node.
This means that ra_log_wal
will automatically batch multiple Raft commands (operations) into a single fsync
call when there are dozens of quorum queues with hundreds of connections.
Consequently, flushing an individual Ra command to disk becomes cheaper on average when there is more traffic on the node.
Our benchmark ran somewhat artificially with a single connection as fast as possible.
AMQP 1.0 in 3.13
# quiver //host.docker.internal//amq/queue/my-quorum-queue
--durable --count 1m --duration 10m --body-size 12 --credit 5000
---------------------- Sender ----------------------- --------------------- Receiver ---------------------- --------
Time [s] Count [m] Rate [m/s] CPU [%] RSS [M] Time [s] Count [m] Rate [m/s] CPU [%] RSS [M] Lat [ms]
----------------------------------------------------- ----------------------------------------------------- --------
2.1 163,582 81,709 11 84.2 2.1 29,548 14,759 3 7.5 840
4.1 336,380 86,356 12 185.3 4.1 29,840 146 0 7.5 2,331
6.1 524,026 93,729 14 328.0 6.1 29,840 0 0 7.5 0
8.1 687,864 81,837 11 462.3 8.1 31,302 730 1 7.5 6,780
10.1 884,470 98,303 14 605.4 10.1 31,447 72 0 7.5 7,897
12.1 999,924 57,669 7 687.5 12.1 31,447 0 0 7.5 0
14.1 999,924 0 0 687.5 14.1 31,447 0 0 7.5 0
16.1 999,924 0 0 687.5 16.1 31,447 0 1 7.5 0
18.1 999,924 0 1 688.3 18.1 31,447 0 0 7.5 0
receiver timed out
20.1 999,924 0 0 688.3 20.1 31,447 0 0 7.5 0
RabbitMQ 3