Missed heartbeat closures in RabbitMQ

Missed heartbeats can cause major issues in applications that utilize RabbitMQ. Potential consequences include long delays in message processing and duplicate message processing. In this blog, we examine reasons clients miss heartbeats and discuss strategies to avoid common causes.

RabbitMQ sends heartbeats to a client connection when no traffic has been received from the client within the negotiated heartbeat interval. The client is expected to ack the heartbeat and send its own heartbeat back to the RabbitMQ server. If the client does not send a heartbeat back to the server after four consecutive heartbeats sent by the RabbitMQ server, RabbitMQ will close the connection.

A common misconception is that missed heartbeats only occur when clients are idle. That can certainly be the case, especially if the client application becomes unresponsive. But missed heartbeat closures can also occur while a consumer is actively processing a recently received message. In my role as a Technical Support Engineer at CloudAMQP, I have investigated where this can happen.

Note: For all connections shown in this post, we use AMQP without TLS. The reason for doing so is that capturing and interpreting the tcp packets is much simpler. In production environments, we strongly recommend using TLS to prevent server compromise and message interception.

Heartbeats when the connection is idle

What do heartbeats look like? Below is a network capture of packets between an AMQP client that is neither sending nor receiving any messages and RabbitMQ; the only network traffic is the RabbitMQ heartbeats. Note that each time RabbitMQ sends a heartbeat ping, the client returns an ack. But that is not sufficient evidence of a healthy client connection to RabbitMQ. The client also needs to send heartbeat pings to RabbitMQ to accommodate RabbitMQ’s requirements.

Time (sec) Source Destination Protocol Info
0.0 RMQ client AMQP Heartbeat
0.01 client RMQ AMQP Heartbeat
0.01 RMQ client TCP [ACK]
0.02 client RMQ TCP [ACK]
10.0 RMQ client AMQP Heartbeat
10.0 client RMQ AMQP Heartbeat
10.0 RMQ client TCP [ACK]
10.02 client RMQ TCP [ACK]
20.0 RMQ client AMQP Heartbeat
20.01 client RMQ AMQP Heartbeat
20.01 RMQ client TCP [ACK]
20.02 client RMQ TCP [ACK]

The heartbeat interval for the AMQP client connection is set to 20 seconds. The heartbeat pings between the client and RabbitMQ are sent every one half of the negotiated heartbeat interval.

Missed heartbeats: long message processing time

Python Pika example

Now, let’s consider a case where the client is consuming messages from a queue. Here, we will use the Python Pika client. We are using a BlockingConnection, with message processing simulated to take some time. If the processing time exceeds the heartbeat interval, the connection will be closed.

Using the receive.py file from the tutorial at https://www.rabbitmq.com/tutorials/tutorial-one-python, with minor modifications, we can simulate a long message-processing time. The message callback function is replaced with the following:

def callback(ch, method, properties, body):
  print(f" [x] Received {body}")
  t1 = time.time()
  dt = time.time() - t1
  while dt < 75:
      time.sleep(2)
      dt = time.time() -t1
      print (f'elapsed time: {dt}',end='\r')
  try:
      print (f'elapsed time: {dt}')
      ch.basic_ack(delivery_tag=method.delivery_tag)
  except:
      print("ack failed!")

Where a loop is used to simulate a message processing operation that takes 75 seconds. I also changed the channel.basic_consume function to set auto_ack to False.

Now, I run the receive code on my client machine:

➜ python3 receive.py
[*] Waiting for messages. To exit press CTRL+C
[x] Received b'test message for long processing times...'
elapsed time: 76.169312000274666
ack failed!

In the RabbitMQ logs, I can see that RabbitMQ closed the connection due to missed heartbeats.

2025-10-15 21:11:21.698766+00:00 [error] <0.11987696.0> closing AMQP connection <0.11987696.0> 
(66.253.203.19:61021 -> 10.32.128.6:5672, duration: '1M, 0s'): 2025-10-15 21:11:21.698766+00:00 
[error] <0.11987696.0> missed heartbeats from client, timeout: 20s

I used tcpdump to capture the traffic for this session. Below are the relevant network packets between the client and RabbitMQ, starting with the client’s request to begin consuming from the “hello” queue. After RabbitMQ sends the frame to confirm the basic.consume, it immediately sends the message that I had already published to the queue.

Time (sec) Source Destination Protocol Info
8.07 client RMQ AMQP Basic.Consume q=hello
8.07 RabbitMQ client AMQP Basic.Consume-Ok Basic.Deliver x= rk=hello Content-Header Content-Body
8.09 client RMQ TCP [ACK]
18.02 RMQ client AMQP Heartbeat
18.04 clients RMQ TCP [ACK]
28.02 RMQ client AMQP Heartbeat
28.04 client RMQ TCP [ACK]
38.02 RMQ client AMQP Heartbeat
38.04 client RMQ TCP [ACK]
48.02 RMQ client AMQP Heartbeat
48.04 client RMQ TCP [ACK]
58.02 RMQ client AMQP Heartbeat
58.04 client RMQ TCP [ACK]
68.02 RMQ client TCP [RST, ACK]

After RabbitMQ delivers the message at time 8.07, it waits 10 seconds (again, half the heartbeat interval) before sending a heartbeat to the client. It then sends a heartbeat every 10 seconds. Finally, after not receiving a heartbeat response from the client, it closes the connection at 68.02 with an RST response (2.5 times the heartbeat interval).

Again, note that the client has acked the heartbeat from RabbitMQ for each of the four heartbeats sent. That is not sufficient for RabbitMQ; the client must send a heartbeat to RabbitMQ.

Ruby Bunny example

Note that many client libraries interrupt message processing in order to respond to RabbitMQ’s heartbeat requests. I repeated the same experiment as with the Python Pika library, using the Ruby Bunny and amqp-client client libraries. In both cases, the client sends heartbeats to RabbitMQ during message processing. Apparently, there is an I/O loop that can interrupt message processing to ensure heartbeats are managed.

Missed heartbeats: effect of prefetch

Setting a prefetch on an AMQP consumer allows RabbitMQ to deliver only a set number of messages to that consumer until messages are acknowledged. If the value is set to 1, the consumer will receive the next message only after the acknowledgement for the previous message is sent. In most cases, this results in a slower message delivery rate than the maximum possible.

However, we can run into some complications with heartbeats when prefetch values exceed 1. In the previous example, we saw that some AMQP clients do not interrupt message processing, which can result in missed heartbeat connection closures. For this example, we will monitor a client that receives a prefetch count of messages at a time. Surprisingly, some client libraries fail to send heartbeats in this case.

Node.js amqplib example

Here, I am using the library amqplib, https://www.npmjs.com/package/amqplib. The heartbeat interval is set to 26 seconds, and prefetch is set to 12. Each message takes 5 seconds to process, and the message-handling code uses a while loop. Here is the relevant code snippet:

ch2.prefetch(12, 0, 0);
ch2.consume(queue, (msg) => {
  if (msg !== null) {
    let cnt;
    console.log(msg.content.toString());
    const startWorking = Date.now();
    let kk;
    while (Date.now() - startWorking < 5000) {
      kk = kk + 1;
    }
    ch2.ack(msg);        
    } else {
    console.log('Consumer cancelled by server');
    }
    });

I expect that after each message is processed (which should take about 5 seconds), the client will send a Basic.Ack to the RabbitMQ cluster. However, if we monitor the traffic between the client and RabbitMQ, I observe the following:

Time (sec) Source Destination Protocol Info
1.24 client RMQ AMQP Basic.Consume q=tasks
1.24 RMQ client AMQP Basic.Consume-Ok Basic.Deliver x= rk=tasks Content-Header Content-Body Basic.Deliver … [12 msgs delivered]
1.26 client RMQ TCP [ACK]
14.11 RMQ client AMQP Heartbeat
14.2 clients RMQ TCP [ACK]
27.11 RMQ client AMQP Heartbeat
27.13 client RMQ TCP [ACK]
40.11 RMQ client AMQP Heartbeat
40.21 client RMQ TCP [ACK]
53.11 RMQ client AMQP Heartbeat
53.21 client RMQ TCP [ACK]
61.32 client RMQ AMQP Basic.Ack
61.32 RMQ client AMQP Basic.Deliver x= rk=tasks Content-Header Content-Body
61.34 client RMQ AMQP Basic.Ack Basic.Ack … [12 total Acks]
61.34 RMQ client AMQP Basic.Deliver x= rk=tasks Content-Header Content-Body … [12 msgs delivered]

Surprisingly, the client waits until all 12 prefetched messages have been processed before sending the Basic.Ack responses to RabbitMQ. In this case, the ACKs were sent just before the missed-heartbeat closure was issued. RabbitMQ sent heartbeats at time 14, 27, 40, and 53 seconds. The client sent the first Basic.Ack at 61.

This example also demonstrates that the client does not actually need to send a heartbeat to RabbitMQ; it just needs to send something, in this case, a Basic.Ack, to avoid being closed.

Had the messages taken a bit longer to process, RabbitMQ would have closed the connection at 66 seconds. In such conditions, the AMQP client would continue processing all 12 received messages to completion. Unfortunately, none of the Basic. Ack would be successfully sent to RabbitMQ. Thus, RabbitMQ would return all 12 messages to the ready state and re-deliver them to the next consumer to subscribe to the queue. In this way, the messages could be processed multiple times by consumers.

The same delays in sending data back to the RabbitMQ cluster for prefetch values greater than 1 are observed with the Python Pika client when using SelectConnection
(see https://github.com/pika/pika/blob/main/examples/asynchronous_consumer_example.py).

The basic example in the repository can be slightly modified (e.g., by increasing message processing time and setting prefetch> 1) to demonstrate this behavior. There may be other commonly utilized client libraries that exhibit this behavior. You might want to verify that the library your application uses either does not have this usage or can be mitigated.

In such conditions, the AMQP client would continue processing all 12 received messages to completion.

TCP keep-alives as an alternative to AMQP heartbeats

We can set the heartbeat interval to 0 for most AMQP clients. In that case, how does RabbitMQ know that a connection has been closed? CloudAMQP configures clusters with standard tcp keep-alives. What is nice about tcp keepalives is that the operating system on each machine is responsible for these processes, not RabbitMQ or the client library.

Here is a wireshark capture of a connection that does not use heartbeats. Like the AMQP heartbeat pings, tcp keepalives are sent by both the client and the RabbitMQ server. The CloudAMQP RabbitMQ server sets the keep-alive interval to 60 seconds. If the client fails to respond to 3 sequential keep-alives, the client connection will be closed.

Time (sec) Source Destination Protocol Info
0.11 client RMQ AMQP Basic.Consume q=CloudAMQP_testing
0.11 RabbitMQ client AMQP Basic.Consume-Ok
0.12 client RMQ TCP [ACK]
0.19 client RMQ TCP [TCP Keep-Alive] 56214 > 5672
0.19 RMQ client TCP [TCP Keep-Alive ACK] 5672 > 56214
60.2 client RMQ TCP [TCP Keep-Alive] 56214 > 5672
60.2 RMQ client TCP [TCP Keep-Alive ACK] 5672 > 56214
60.9 RMQ client TCP [TCP Keep-Alive] 5672 > 56214
60.92 client RMQ TCP [TCP Keep-Alive ACK] 56214 > 5672
120.92 client RMQ TCP [TCP Keep-Alive] 56214 > 5672
120.92 RMQ client TCP [TCP Keep-Alive ACK] 5672 > 56214
122.34 RMQ client TCP [TCP Keep-Alive] 5672 > 56214
122.45 client RMQ TCP [TCP Keep-Alive ACK] 56214 > 5672
182.46 client RMQ TCP [TCP Keep-Alive] 56214 > 5672
182.46 RMQ client TCP [TCP Keep-Alive ACK] 5672 > 56214
183.78 RMQ client TCP [TCP Keep-Alive] 5672 > 56214
183.89 client RMQ TCP [TCP Keep-Alive ACK] 56214 > 5672

Conclusions and Recommendations

  • Make sure you know how your AMQP client library handles heartbeats
  • Set a value for the heartbeat interval that works for your clients
  • If prefetch is not 1 for your consumers, ensure that heartbeat intervals are long enough, or consider using a client library that gives high priority to heartbeat handling
  • Consider using tcp keep-alives only for connection health checks

We hope you enjoy this article. If you want to discuss your specific challenges and explore how CloudAMQP can be helpful, please reach out to us at support@lavinmq.com.

👋

This blog is written by developers at CloudAMQP to showcase CloudAMQP improvements. We have over a decade of experience helping companies expand their messaging knowledge and usage. Support is included in all CloudAMQP plans, available around the clock.

Discover CloudAMQP, Free RabbitMQ Training, Free RabbitMQ Ebook

CloudAMQP - industry leading RabbitMQ as a service

Start your managed cluster today. CloudAMQP is 100% free to try.

13,000+ users including these smart companies