RabbitMQ Streams and Replay Features, Part 3: Limits and configurations

The previous article in this series explored how to get started with RabbitMQ Streams. But beyond the default configurations that are bundled with a Stream, you might want to customize these default settings in some scenarios. This article looks at some of these configurations as well as the limitations of RabbitMQ Streams.

RabbitMQ Streams Configurations

You can configure a Stream in RabbitMQ using queue arguments specified with a policy key or at the time of queue declaration. Read on to learn about these configurations.

Data Retention Configurations

Since RabbitMQ Streams are immutable, they inherently tend to grow infinitely. As this is an undesirable behavior, a Stream can be configured to discard old messages through a retention policy. Hold on– retention policy?

Yes, through a retention policy, you can configure a Stream to truncate its messages once it reaches a given size or a specified age. Truncating messages entails deleting an entire segment file. But what is a segment file?

RabbitMQ Streams do not persist messages in a big, single file. Instead, a Stream is broken down into smaller files known as segment files. A Stream truncates its size by deleting a segment file and all its messages. To configure a Stream's retention strategy, you can adopt size or time-based retention strategies.

Size-based retention strategy - the Stream is configured to truncate its size once the total size of the stream reaches a given value.
Time-based retention strategy - the Stream is configured to truncate a segment file once that segment reaches a given age.

Setting up the sized-based retention strategy requires providing the following arguments when declaring the Stream. As mentioned earlier, this can also be done through a policy:

x-max-length-bytes
x-stream-max-segment-size-bytes

On the other hand, setting up the time-based retention strategy requires providing the following arguments when declaring the Stream:

x-max-age
x-stream-max-segment-size-bytes

Notice how the x-stream-max-segment-size-bytes argument is required in both strategies? We will explain this. First, let’s make sense of these arguments.

x-max-length-bytes

This argument will control the maximum size of the RabbitMQ Stream. When this is set, RabbitMQ will delete segment files from the beginning of the Stream. The deletion happens when the Stream’s total size reaches the value of x-max-length-bytes.

For example, if the maximum size of a Stream is set to "x-max-length-bytes":100000000, the Stream will discard the oldest messages when the Stream’s disk usage hits 100000000 bytes. RabbitMQ does not provide a default value for this.

The unit could be in KB, MB, GB, or TB, however, when you just provide a value for this argument without a unit, it will default to bytes.

max-age

This argument will control how long a message survives in a RabbitMQ Stream. The unit of this configuration could either be in years (Y), months (M), days (D), hours (H), minutes (M), or seconds (S).

For example, if the max-age of a Stream is set to, "x-max-age":"30D”, the Stream will discard segment files that have been there for 30 days or more. RabbitMQ does not provide a default value for this.

x-stream-max-segment-size-bytes

As mentioned earlier, RabbitMQ Streams encompass one or more segment files on disk, and this argument controls the size of each segment. For example, if the maximum size of the segment file of a Stream is set to "x-stream-max-segment-size-bytes":50000, each segment file will have a maximum size of 50000 bytes. RabbitMQ provides a default value for this: 500000000 bytes

Now, back to your question.

Why is the x-stream-max-segment-size-bytes argument required in both retention strategies?

The max-age and x-max-length-bytes arguments are important for the retention of messages in RabbitMQ Streams, but the retention is evaluated on a per-segment basis. Essentially, Streams only apply the retention policies whenever an existing segment file has reached its maximum size and is closed in favor of a new one.

As a result, if the x-stream-max-segment-size-bytes argument is not provided, the Stream will never know when to close the current segment file and create a new one. And, by extension, invoke the retention policy. This is why this argument is required in the size and time-based retention strategies.

Note: The x-max-length-bytes, and the x-max-age, arguments can be combined. And, of course, always provide the third required argument. In that case, the Stream will only discard messages when both conditions are true. Not clear?

Okay, for example, if the x-max-length-bytes, is 100(not ideal) and the x-max-age is 30D, the Stream will only discard segment files that have been in the Stream for more than 30 days only when the Stream's disk usage reaches 100. In essence, even if there are segment files whose max-age has exceeded the limit, the Stream won’t discard them until the max length is exceeded and vice versa.

Controlling the Initial Replication Factor

Remember Streams are persistent and replicated. When a Stream is initialized, RabbitMQ will create a replica of the Stream on some randomly selected nodes in the cluster. However, the number of replicas can be controlled in two ways:

with the x-initial-cluster-size queue argument when declaring the Stream via an AMQP client.
With the initial-cluster-size queue argument when declaring the Stream via the stream plugin.

x-initial-cluster-size

This argument controls the number of nodes in the cluster on which the Stream will be replicated. Like quorum queues and replicated classic queues, streams are affected by cluster sizes. The more replicas a stream has, the more data needs to be replicated, lowering the throughput. It is recommended to use an uneven cluster size to constitute a quorum, such as 1, 3, or 5.

For example, “x-initial-cluster-size”: 3

RabbitMQ Stream Leader Election Configuration

Even though a Stream would always have replicas across nodes, there is always the leader replica or node. All Stream operations go through the leader replica first and then replicated on the other nodes. Which node becomes the replica is controlled in three ways:

By passing the x-queue-leader-locator argument when declaring the Stream
By setting the queue-leader-locator policy key
By defining the queue_leader_locator in the configuration file

The supported values for leader election configuration are:

client-local - This is the default value. The client that declares the Stream is usually connected to some node. The client-local value elects this node to be the leader.
Balanced - If there are less than 1000 queues, make the node hosting the minimum number of Stream leaders the leader. Else, make a random node the leader.

RabbitMQ Streams Limitations

Message Encoding

Streams store messages as AMQP 1.0 encoded data. When publishing using AMQP 0.9.1 a conversion is done under the hood. While this conversion will often play out well, sometimes it doesn’t. For example, if the header of an AMQP 0.9.1 message contains complex values like arrays/lists, the header will not be converted. That is because headers in an AMQP 1.0 message can only contain values of simple types, such as strings and numbers.

UI Metric Accuracy

When working with Streams, sometimes the Management UI does not reflect the precise message count. In streams, offset tracking information also counts as messages, making the message count artificially larger than it is. This should make no practical difference in most systems.

Wrap-Up

This series explored the fundamentals of RabbitMQ Streams, from when to use Streams in part 1 to how to get started with them in part 2. . This article took a step further to cover some optional configurations that make it easier to tweak a Stream for a specific use case.

Overall, Streams weren’t created to replace queues, but to complement them. Streams open up new possibilities for RabbitMQ use cases.

Ready to start using RabbitMQ in your architecture? CloudAMQP is one of the world’s largest RabbitMQ cloud hosting providers. In addition to RabbitMQ, we also created our in-house message broker, LavinMQ with a throughput of around 1,000,000 messages/sec.

Easily create a free RabbitMQ or free LavinMQ instance on CloudAMQP. All available after a quick and easy signup.

Email us at contact@cloudamqp.com with any suggestions, questions, or feedback.