AMQP 1.0 Benchmarks

This blog post demonstrates that native AMQP 1.0 in RabbitMQ 4.0 provides significant performance and scalability improvements compared to AMQP 1.0 in RabbitMQ 3.13.

Additionally, this blog post suggests that AMQP 1.0 can perform slightly better than AMQP 0.9.1 in RabbitMQ 4.0.

Setup

The following setup applies to all benchmarks in this blog post:

Intel NUC 11
8 CPU cores
32 GB RAM
Ubuntu 22.04
Single node RabbitMQ server
Server runs with (only) 3 scheduler threads (set via runtime flags as +S 3)
Erlang/OTP 27.0.1
Clients and server run on the same box

We use the latest RabbitMQ versions at the time of writing:

The following advanced.config is applied:

[
 {rabbit, [
  {loopback_users, []}
 ]},

 {rabbitmq_management_agent, [
  {disable_metrics_collector, true}
 ]}
].

Metrics collection is disabled in the rabbitmq_management_agent plugin.
For production environments, Prometheus is the recommended option.

RabbitMQ server is started as follows:

make run-broker 
    TEST_TMPDIR="$HOME/scratch/rabbit/test" 
    RABBITMQ_CONFIG_FILE="$HOME/scratch/rabbit/advanced.config" 
    PLUGINS="rabbitmq_prometheus rabbitmq_management rabbitmq_amqp1_0" 
    RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS="+S 3"

The rabbitmq_amqp1_0 plugin is a no-op plugin in RabbitMQ 4.0.

The AMQP 1.0 benchmarks run quiver in a Docker container:

$ docker run -it --rm --add-host host.docker.internal:host-gateway ssorj/quiver:latest
bash-5.1# quiver --version
quiver 0.4.0-SNAPSHOT

Classic Queues

This section benchmarks classic queues.

We declare a classic queue called my-classic-queue:

deps/rabbitmq_management/bin/rabbitmqadmin declare queue 
    name=my-classic-queue queue_type=classic durable=true

AMQP 1.0 in 4.0

The client sends and receives 1 million messages.
Each message contains a payload of 12 bytes.
The receiver repeatedly tops up 200 link credits at a time.

# quiver //host.docker.internal//queues/my-classic-queue 
    --durable --count 1m --duration 10m --body-size 12 --credit 200

RESULTS

Count ............................................. 1,000,000 messages
Duration ............................................... 10.1 seconds
Sender rate .......................................... 99,413 messages/s
Receiver rate ........................................ 99,423 messages/s
End-to-end rate ...................................... 99,413 messages/s

Latencies by percentile:

          0% ........ 0 ms       90.00% ........ 1 ms
         25% ........ 1 ms       99.00% ........ 2 ms
         50% ........ 1 ms       99.90% ........ 2 ms
        100% ........ 9 ms       99.99% ........ 9 ms

AMQP 1.0 in 3.13

# quiver //host.docker.internal//amq/queue/my-classic-queue 
    --durable --count 1m --duration 10m --body-size 12 --credit 200

RESULTS

Count ............................................. 1,000,000 messages
Duration ............................................... 45.9 seconds
Sender rate .......................................... 43,264 messages/s
Receiver rate ........................................ 21,822 messages/s
End-to-end rate ...................................... 21,790 messages/s

Latencies by percentile:

          0% ....... 67 ms       90.00% .... 24445 ms
         25% .... 23056 ms       99.00% .... 24780 ms
         50% .... 23433 ms       99.90% .... 24869 ms
        100% .... 24873 ms       99.99% .... 24873 ms

The same benchmark against RabbitMQ 3.13 results in 4.5 times lower throughput.

Detailed test execution

---------------------- Sender -----------------------  --------------------- Receiver ----------------------  --------
Time [s]      Count [m]  Rate [m/s]  CPU [%]  RSS [M]  Time [s]      Count [m]  Rate [m/s]  CPU [%]  RSS [M]  Lat [ms]
-----------------------------------------------------  -----------------------------------------------------  --------
     2.1        130,814      65,342        8     79.1       2.1          3,509       1,753        1      7.5       777
     4.1        206,588      37,849        6     79.1       4.1          5,995       1,242        0      7.5     2,458
     6.1        294,650      43,987        6     79.1       6.1          9,505       1,753        1      7.5     5,066
     8.1        360,184      32,734        5     79.4       8.1         13,893       2,194        0      7.5     6,190
    10.1        458,486      49,102        6     79.4      10.1         15,793         950        1      7.5     9,259
    12.1        524,020      32,734        5     79.4      12.1         21,644       2,923        1      7.5    11,163
    14.1        622,322      49,102        5     79.4      14.1         25,154       1,753        1      7.5    13,451
    16.1        687,856      32,734        4     79.4      16.1         27,639       1,241        1      7.5    15,246
    18.1        786,158      49,102        6     81.0      18.1         30,124       1,241        1      7.5    17,649
    20.1        884,460      49,102        6     81.0      20.1         32,610       1,242        1      7.5    19,408
    22.1        949,994      32,734        4     81.0      22.1         35,535       1,462        0      7.5    21,293
    24.1        999,912      24,934        4     81.8      24.1         38,167       1,315        1      7.5    23,321
    26.1        999,974          31        2      0.0      26.1        117,745      39,749       11      7.5    24,475
       -              -           -        -        -      28.1        202,589      42,380       11      7.5    24,364
       -              -           -        -        -      30.1        292,554      44,938       13      7.5    24,244
       -              -           -        -        -      32.1        377,691      42,526       15      7.5    23,955
       -              -           -        -        -      34.1        469,704      45,961       14      7.5    23,660
       -              -           -        -        -      36.1        555,719      42,965       12      7.5    23,463
       -              -           -        -        -      38.1        649,048      46,618       12      7.5    23,264
       -              -           -        -        -      40.1        737,696      44,280       15      7.5    23,140
       -              -           -        -        -      42.1        826,491      44,353       15      7.5    23,100
       -              -           -        -        -      44.1        917,187      45,303       16      7.5    23,066
       -              -           -        -        -      46.1        999,974      41,394       14      0.0    22,781

AMQP 0.9.1 in 4.0

For our AMQP 0.9.1 benchmarks we use PerfTest.
We try to run a somewhat fair comparison of our previous AMQP 1.0 benchmark.

Since an AMQP 1.0 /queues/:queue target address sends to the default exchange, we also send to the default exchange via AMQP 0.9.1.
Since we used durable messages with AMQP 1.0, we set the persistent flag in AMQP 0.9.1.
Since RabbitMQ settles with the released outcome when a message cannot be routed, we set the mandatory flag in AMQP 0.9.1.
Since RabbitMQ 4.0 uses a default rabbit.max_link_credit of 128 granting 128 more credits to the sending client when remaining credit falls below 0.5 * 128, we configure the AMQP 0.9.1 publisher to have at most 1.5 * 128 = 192 messages unconfirmed at a time.
Since we used 200 link credits in the previous run, we configure the AMQP 0.9.1 consumer with a prefetch of 200.

$ java -jar target/perf-test.jar 
    --predeclared --exchange amq.default 
    --routing-key my-classic-queue --queue my-classic-queue 
    --flag persistent --flag mandatory 
    --pmessages 1000000 --size 12 --confirm 192 --qos 200 --multi-ack-every 200

id: test-151706-485, sending rate avg: 88534 msg/s
id: test-151706-485, receiving rate avg: 88534 msg/s
id: test-151706-485, consumer latency min/median/75th/95th/99th 99/975/1320/1900/2799 µs
id: test-151706-485, confirm latency min/median/75th/95th/99th 193/1691/2113/2887/3358 µs

Summary

Figure 1: Classic queue end-to-end message rate

Quorum Queues

This section benchmarks quorum queues.

We declare a quorum queue called my-quorum-queue:

deps/rabbitmq_management/bin/rabbitmqadmin declare queue 
    name=my-quorum-queue queue_type=quorum durable=true

Flow Control Configuration

For highest data safety, quorum queues fsync all Ra commands including:

enqueue: sender enqueues a message
settle: receiver accepts a message
credit: receiver tops up link credit

Before a quorum queue confirms receipt of a message to the publisher, it ensures that any file modifications are flushed to disk, making the data safe even if the RabbitMQ node crashes shortly after.

The SSD of my Linux box is slow, taking 5-15 ms per fsync.
Since we want to compare AMQP protocol implementations without being bottlenecked by a cheap disk, the tests in this section increase flow control settings:

advanced.config

[
 {rabbit, [
  {loopback_users, []},

  %% RabbitMQ internal flow control for AMQP 0.9.1
  %% Default: {400, 200}
  {credit_flow_default_credit, {5000, 2500}},

  %% Maximum incoming-window of AMQP 1.0 session.
  %% Default: 400
  {max_incoming_window, 5000},

  %% Maximum link-credit RabbitMQ grants to AMQP 1.0 sender.
  %% Default: 128
  {max_link_credit, 2000},

  %% Maximum link-credit RabbitMQ AMQP 1.0 session grants to sending queue.
  %% Default: 256
  {max_queue_credit, 5000}
 ]},

 {rabbitmq_management_agent, [
  {disable_metrics_collector, true}
 ]}
].

This configuration allows more Ra commands to be batched before RabbitMQ calls fsync.
For production use cases, we recommend enterprise-grade high performance disks that fsync faster, in which case there is likely no need to increase flow control settings.

RabbitMQ flow control settings present a trade-off:

Low values ensure stability in production.
High values can result in higher performance for individual connections but may lead to higher memory spikes when many connections publish large messages concurrently.

RabbitMQ uses conservative flow control default settings to favour stability in production over winning performance benchmarks.

AMQP 1.0 in 4.0

# quiver //host.docker.internal//queues/my-quorum-queue 
    --durable --count 1m --duration 10m --body-size 12 --credit 5000

RESULTS

Count ............................................. 1,000,000 messages
Duration ............................................... 12.0 seconds
Sender rate .......................................... 83,459 messages/s
Receiver rate ........................................ 83,396 messages/s
End-to-end rate ...................................... 83,181 messages/s

Latencies by percentile:

          0% ........ 9 ms       90.00% ....... 47 ms
         25% ....... 27 ms       99.00% ....... 61 ms
         50% ....... 35 ms       99.90% ....... 76 ms
        100% ....... 81 ms       99.99% ....... 81 ms

Default Flow Control Settings

The previous benchmark calls fsync 1,244 times in the ra_log_wal module (that implements the Raft write-ahead log).

The same benchmark with default flow control settings calls fsync 15,493 times resulting in significantly lower throughput:

# quiver //host.docker.internal//queues/my-quorum-queue 
    --durable --count 1m --duration 10m --body-size 12 --credit 5000

RESULTS

Count ............................................. 1,000,000 messages
Duration .............................................. 100.2 seconds
Sender rate ........................................... 9,986 messages/s
Receiver rate ......................................... 9,987 messages/s
End-to-end rate ....................................... 9,983 messages/s

Latencies by percentile:

          0% ....... 10 ms       90.00% ....... 24 ms
         25% ....... 14 ms       99.00% ....... 30 ms
         50% ....... 18 ms       99.90% ....... 38 ms
        100% ....... 55 ms       99.99% ....... 47 ms

Each fsync took 5.9 ms on average.

(15,493 - 1,244) * 5.9 ms = 84 seconds

Therefore, this benchmark with default flow control settings is blocked for 84 seconds longer executing fsync than the previous benchmark with increased flow control settings.
This shows how critical enterprise-grade high performance disks are to get the best results out of quorum queues.
For your production workloads, we recommend using disks with lower fsync latency rather than tweaking
RabbitMQ flow control settings.

It’s worth noting that the Raft WAL log is shared by all quorum queue replicas on a given RabbitMQ node.
This means that ra_log_wal will automatically batch multiple Raft commands (operations) into a single fsync
call when there are dozens of quorum queues with hundreds of connections.
Consequently, flushing an individual Ra command to disk becomes cheaper on average when there is more traffic on the node.
Our benchmark ran somewhat artificially with a single connection as fast as possible.

AMQP 1.0 in 3.13

# quiver //host.docker.internal//amq/queue/my-quorum-queue 
    --durable --count 1m --duration 10m --body-size 12 --credit 5000

---------------------- Sender -----------------------  --------------------- Receiver ----------------------  --------
Time [s]      Count [m]  Rate [m/s]  CPU [%]  RSS [M]  Time [s]      Count [m]  Rate [m/s]  CPU [%]  RSS [M]  Lat [ms]
-----------------------------------------------------  -----------------------------------------------------  --------
     2.1        163,582      81,709       11     84.2       2.1         29,548      14,759        3      7.5       840
     4.1        336,380      86,356       12    185.3       4.1         29,840         146        0      7.5     2,331
     6.1        524,026      93,729       14    328.0       6.1         29,840           0        0      7.5         0
     8.1        687,864      81,837       11    462.3       8.1         31,302         730        1      7.5     6,780
    10.1        884,470      98,303       14    605.4      10.1         31,447          72        0      7.5     7,897
    12.1        999,924      57,669        7    687.5      12.1         31,447           0        0      7.5         0
    14.1        999,924           0        0    687.5      14.1         31,447           0        0      7.5         0
    16.1        999,924           0        0    687.5      16.1         31,447           0        1      7.5         0
    18.1        999,924           0        1    688.3      18.1         31,447           0        0      7.5         0
receiver timed out
    20.1        999,924           0        0    688.3      20.1         31,447           0        0      7.5         0

RabbitMQ 3

Classic Queues​

AMQP 1.0 in 4.0​

AMQP 1.0 in 3.13​

AMQP 0.9.1 in 4.0​

Summary​

Quorum Queues​

Flow Control Configuration​

AMQP 1.0 in 4.0​

AMQP 1.0 in 3.13​

Related Articles