Monitoring Tools

CloudAMQP offers various monitoring tools that will address performance issues promptly and automatically before they impact your business. CloudAMQP monitoring includes diagrams for CPU and Memory usage. It is possible to activate alarms that will be triggered when part of the system is heavily used. The RabbitMQ log stream is easy to view directly in CloudAMQP.

Alarms

Receive accurate alerts based on performance anomalies in your application by activating Queue Alarms, Consumers Alarm, CPU Alarms, and Memory Alarms.

Alarms can be sent to email addresses, create push notifications to webhooks, sent to Slack, PagerDuty, Microsoft Teams, VictorOps, or OpsGenie. When an alarm is triggered, an alert is sent to everyone specified in the notifications list. You need to set up a notification before any of the alarms will work correctly. Learn more about payload sent when using a webook.

NOTE: The admin user must have access to all vhosts on the server for the queue and consumer alarms to work correctly.

rabbitmq alarm notifications

Queue Alarms

Queue alarms can be activated for all instances.

Queue alarms can be triggered to send notifications when a number of messages in a queue reaches a certain threshold for a given amount of time.

RabbitMQ Queue Alarms

When a queue matches the given regexp and when that queue has had more than the value threshold (more than x messages in the queue) for more than a number of seconds , the alarm will trigger.

Example: The regexp .* will match all your queues. A regexp like ^myqueue$ would match exactly the queue named "myqueue". Use Rubular to test your regex.

Consumers Alarms

Consumers alarms can be activated for all instances.

Consumers Alarms can be triggered to send notifications when the number of consumers for a queue is less than or equal to a given number of consumers, for a given amount of time.

RabbitMQ Consumer Alarms

Connection Alarms

Connection Alarms can be triggered to send notifications when the number of connections is greater than or equal to a given number of connections, for a given amount of time.

RabbitMQ Connection Alarms

Connection Flow Alarms

Connection Flow Alarms can be activated for all instances.

The Connection Flow Alarm checks if a connection state has been changed from running to flow. Connection Flow Alarms can be triggered to send notifications when a connection flow is in a flow state, for a given amount of time.

RabbitMQ Connection Flow Alarms

CPU Alarms

CPU Alarms are only available for dedicated instances.

When CPU Alarm is enabled you will receive a notification when you are using more than 80% of the available CPU for more than 15 minutes. This level can be modified. It is recommended to enable CPU Alarms, which can be done through the Control Panel. CPU alarms

Memory Alarms

Memory Alarms are only available for dedicated instances.

When Memory Alarms is enabled you will receive a notification when you are using more than 90% of the virtual machine's memory for more than 15-minutes. It is recommended to enable Memory Alarms, which can be done through the Control Panel.

RabbitMQ Memory Alarms

Server Metrics

Server Metrics measure performance metrics from your server. CloudAMQP shows CPU Usage Monitoring and Memory Usage Monitoring.

CPU Usage

CPU Usage refers to how much work your processor is doing.

  • I/O Wait: Shows the percentage of time spent by the CPU waiting for an IO (input/output) operation to complete, in other words, the percentage of time the CPU has to wait on the disk. If this is high you should consider if more messages can be published as transient instead of persistent or make sure that your queues are short so that messages don't have to be written to disk. You can also contact our support to discuss other solutions.

  • User time: Shows the percentage of time your program spends executing instructions in the CPU. In this case, the time the CPU spent running RabbitMQ.

    If this is high it probably means you are near the limit of what your server can handle. You should consider upgrading before the lack of CPU power becomes a serious issue.

  • System time: Describes the percentage of time the CPU spent running OS tasks.

  • Steal time: The percentage of CPU time "stolen" by the virtualization system, or time spent when the virtual CPU waits for a real CPU. If this is high it may mean that you are using to much CPU power, which can seriously impact the performance of your server. You should probably upgrade to a larger instance.

metrics

Memory Usage

  • Used: Percentage of used memory.
  • Free: Percentage of free memory.
rabbitmq memory usage

RabbitMQ Log Stream

RabbitMQ Log Stream shows a live log from RabbitMQ.

rabbitmq log stream

Integrated Monitoring Services

Integrated monitoring services are only available for dedicated instances.

CloudAMQP has integrated log and metric services including CloudWatch, Papertrail, Logentries, Stackdriver, Loggly, Splunk, DataDog, Librato, and New Relic. Learn more about our available CloudAMQP monitoring services.

Notifications payload - webhooks

Alarm notifications can be received via webhooks. This section describes the content payload that is being sent in each POST.

  • type: Type of alarm such as queue, consumer, cpu, memory, disk, connection, netsplit_join and netsplit_split
  • appname: Name of the instance that triggered the alarm.
  • hostname: Hostname of the instance that triggered the alarm.
  • threshold: Value threshold specified for the alarm.
  • vhost_regexp: Regexp for the vhost.
  • regexp: Regexp of your specified alarm (e.g. Queue regexp).
  • time_until_fire: Time threshold specified for the alarm.
  • options: Could include extra information about the alarm such as message_type (messages_unacknowledge, messages_ready, messages)
  • account_id: Account id of the instance that triggered the alarm.