We're thrilled to announce a new RabbitMQ Diagnostic Tool! A while ago, we published a popular RabbitMQ Best Practice Guide, and we have now taken it one step further and simplified your investigation of misbehaving RabbitMQ clusters even more.
The new RabbitMQ Diagnostic Tool has already been used by a lot of our customers and will, by checking your setup, give you suggestions of things you need to look into. This tool is beneficial for both our customers, but also for our support engineers as they more efficiently are able to check and identify customers setup and possible errors.
The RabbitMQ Diagnostic Tool is available for all dedicated instances and can be used together with our RabbitMQ Best Practice guide.
You can find the RabbitMQ Diagnostic Tool in the control panel for your instance, via the Diagnostics tab in the menu.
Most use cases should aim to get a valid response to 100% of all test. However, there are use cases, where breaking some common rules is necessary.
The RabbitMQ Best practice guide can be found here: RabbitMQ Best Practice
So, what are we actually testing in the diagnostic tool?
We are first of all checking the General setup, like version and RabbitMQ mangement settings. Queue sizes and queue setups are tested, your connections and your clients.
I recommend your to enter the Diagnostics tab in the control panel and check each of your dedicated instance.
RabbitMQ is improving all the time, bugs are fixed and new features are added. We recommend that you keep RabbitMQ up to date.
Erlang's performance and reliability is improving with each new version. We recommend that you always keep Erlang up to date.
RabbitMQ Management statistics rate mode
Setting the RabbitMQ Management statistics rate mode to detailed has a serious performance impact and should not be used in production.
Number of exchanges
Having lots of exchanges is not best practice use of RabbitMQ.
Ensure that you are not using topic exchange as fanout
Use Fanout exchange instead of using topic exchange with only '#' bindings.
Ensure that all published messages routed
If you are not routing all messages, it might indicate that you are missing bindings.
Ensure that you have a HA-policy on all vhosts
Not having a HA-policy on all vhosts will cause message loss on netsplits. You should add a HA policy on custom vhosts even if you only have one node to be able to do upgrades without message loss.
RabbitMQ performs best if queues are short. If it suits your use case, try keeping queues as short as possible. We recommend less than 10 000 messages in one queue.
Even unused queues take up some resources, queue index, mangement statistics etc. Make sure that you don't leave unused queues left behind.
Temporary queues should be auto deleted
Leaving temporary queues can eventually cause that RabbitMQ run out of memory. Set temporary queues as auto delete, exclusive or auto expire.
Presistent messages in durable queues
Having durable queues with transient messages might be a client error. Remember to set the persistent flag when publishing otherwise messages will be lost on restart.
No mirrored auto delete queues
Mirroring auto delete queues is probably unnecessary unless you have a very specific use case for it.
No transient messages in mirrored queues
Publishing transient messages in mirrored queues might be a client error. Remember to set the persistent flag when publishing.
Limited use of priority queues
Each priority level uses an internal queue on the Erlang VM, which takes up some resources. In most use cases it is sufficent to have no more than 5 priority levels.
Long lived connections
RabbitMQ is optimized for long lived connections. Each connection establishment is pretty heavy and uses many TCP packets. Keep connections open if you are able to. If you have a client that is unable to keep connection long lived you can use
Connection leaks can cause RabbitMQ to run out of memory. Make sure your clients are not leaking connections. If you have more than 10 connections from the same host, you may have a connection leak.
Channel leaks can eventually cause RabbitMQ to run out of memory. Make sure your clients are not leaking channels. If you have connections with more than 1000 channels, you may have a channel leak.
Channels on all connections
Connections without channels may be an indication that your clients are not working properly.
Use TLS for all connections made over public internet. It is for most AMQP clients, as simple as replacing amqp:// with amqps://.
Separate connections for publishers and consumers
If you are using the same connection for publishers and consumers, you won't be able to consume if the connection is in flow control, which will worsen the flow problem.
Not all RabbitMQ client libraries/versions are well behaved. We will check your client library.
All CloudAMQP servers implement sensible TCP keepalive so AMQP Hearthbeats are not neccessary.
Not setting a prefetch can lead to clients running out of memory and makes it impossible to scale out with more consumers. Always configure channel prefetch. More information about prefetch value can be found in the RabbitMQ Best Practice guide.