Publishing Throughput - Asynchronous vs Synchronous

In this article we will look at the impact that asynchronous vs synchronous publishing has on throughput when using publisher confirms. First, let’s clarify what we mean by asynchronous vs synchronous publishing.

Asynchronous publishing is where an application either:

sends messages in a fire-and-forget manner (no publisher confirms)
or does not stop to wait for each individual confirm. An asynchronous publisher may have a number of messages in-flight (unconfirmed) at a time. The broker is free to use the multiple flag in its confirms and can acknowledge tens to hundreds of messages in a single confirm.

Synchronous publishing is where an application sends a message and waits for the acknowledgement (publisher confirm) before sending the next message. There is one publisher confirm per message, a ping-pong of message-confirm between the publisher and the broker.

Some applications are unable to maintain a persistent connection to RabbitMQ and open/close a connection when sending each message. We’ll see the massive impact on throughput and resources that one connection per message has also.

The Python client, Pika, offers both a synchronous (using BlockingConnection) and an asynchronous (using SelectConnection) method of publishing. The .NET client offers an event based client that allows you to wait for each confirm (synchronous) or to wait periodically after a series of messages has been published (asynchronous). Each client library allows for synchronous and asynchronous message publishing. We will be using Pika as the client and a RabbitMQ broker with 4 virtual cores and 16GB RAM.

The take away from this article should not be the hard throughput numbers that we see below but the relative performance of synchronous vs asynchronous publishing. The hard numbers depend entirely on your language, hardware, network, broker/client versions and general load on your system as a whole.

Publishing with no network latency

Network latency can play a large role in publishing throughput. We’ll be seeing the effect that latency has later on. For now as a baseline, we’ll measure throughput when latency is at 0ms.

Asynchronous Publisher - Without Confirms

This publisher sends messages in a fire-and-forget manner, not waiting for any kind of acknowledgement from the broker. One publisher is capable of sending 13,000 messages a second. As we add publishers, total throughput increases almost linearly.

With five publishers we almost saturate the brokers CPUs.

Asynchronous Publisher With Confirms

This asynchronous publisher is continuously sending messages, allowing up to 10000 in-flight messages at a time. In a callback, it receives publisher confirms and removes the message from its pending confirmation list. When the pending list reaches 10000 it pauses publishing for a 100ms, in order for confirms to catch up.

In the chart below we see that the first publisher alone is able to reach a rate of 12,000 messages per second, reaching 40,000 for five publishers. Adding publishers did not quite scale throughput linearly. We see a similar CPU profile as the asynchronous publisher that did not use confirms, while achieving 20,000 messages a second less total throughput.

At five publishers, the four virtual CPUs are maxed out.

Synchronous Publisher

The synchronous publisher sends a message, waits for the confirm, sends the next message and so on. In the chart below we see the overall publishing throughput as we increase from one publisher to five. Each publisher adds more or less 400-500 messages a second to overall throughput.

We see that the CPU load on the broker increases as we add publishers. There are four virtual cores and so the maximum % is 400%. So our rate of 2700 messages per second is costing us 75% CPU utilization per virtual core.

This time the bottleneck is not CPU but the fact that we are synchronously waiting for each confirm. Throughput is dramatically lower than either of our asynchronous publishers.

New Connection Per Message

Opening an AMQP connection is very costly involving multiple round-trips. The cost of connection management outweighs the cost of sending each message. This is reflected in a message rate of just over 100 messages a second for one publisher and 375 messages a second for five publishers.

All the opening and closing of AMQP connection takes its toll on CPU usage.

We reach the same 75% per core utilization as with the 5 synchronous publishers, but with 20% of the message throughput of a synchronous publisher.

You can see the connection churn in the management plugin.

This shows that you really, really don’t want to publish messages this way. But sometimes persistent connections are not possible, however, there is a solution that alleviates the connection churn issues and lower throughput - the AMQProxy. Check out our AMProxy article that shows how you can use the “one connection per message” method, without paying the high price.

Network Latency and Throughput

So far we’ve seen message throughput with 0ms of network latency. Now we’re going to see what happens when we start adding some latency. In theory, synchronous publishers should be more affected by latency and the “new connection per message” publishers even more so.

Asynchronous Publisher - No Confirms

We see that when publishing is done totally asynchronously, i.e without publisher confirms, that throughput is unaffected by the higher network latencies.

Asynchronous Publisher - With Confirms

While the asynchronous publisher with confirms does have to periodically wait for confirms, it is still mostly unaffected by higher network latencies. The reason for this is the asynchronous nature of the publishing, coupled by the use of the multiple flag by the broker with its publisher confirms.

Synchronous Publisher (Confirms)

We see that even adding 1 ms of latency has a noticeable impact and 5 ms drops throughput by over 50%. By the time we reach 50-100 ms, throughput has dropped from its peak of around 700 messages per second to a trickle.

New Connection Per Message Publisher

The first chart shows the New Connection Per Message publisher without confirms. While the messages are sent in a fire and forget manner, for every message, multiple round-trips are required to establish the AMQP connection. So it is no surprise that even 1 ms of extra latency drops the publishing rate by over 50%, which steadily drops with every extra step of latency.

This chart shows the same publisher with confirms. We see that adding an extra round-trip does have an impact, but that the largest impact has to be connection establishment.

Summary

Publishing messages asynchronously greatly increases throughput, and also increases publishing efficiency. The effect of latency on asynchronous publishers is also minimal. What is interesting is that we were still able to achieve high throughput while using confirms. Allowing 10000 in-flight messages may be high, but it does give a best of both regarding throughput and data safety.

Synchronous publishers are slower and also more sensitive to network latency. Because each message requires a separate publisher confirm, it is also much less efficient than the asynchronous method.

Finally we see dismal performance when opening/closing an AMQP connection for each message and that even very small latencies can greatly impact throughput. Check out our article on the AMQProxy to see how publishers can use “one connection per message” publishing while avoiding most of the performance and resource consumption penalties.