Decentralized observability with OpenTelemetry, Part 2
In Part 1 we explained why we rebuilt our metrics system using Grafana Mimir and the OpenTelemetry Collector. This post covers how we extended that approach to our logging infrastructure.
Why we needed a new log system
While implementing our new metrics system, our log provider Papertrail announced they were shutting down. This forced us to migrate our logging infrastructure, but it also presented an opportunity to unify our observability stack.
We evaluated several alternatives and established requirements similar to our metrics system:
- Cost-effective storage for high log volumes
- No explicit source registration (our fleet is dynamic)
- Support for alerting based on log patterns
Choosing Grafana Loki
We chose Grafana Loki as our new log storage backend. Unlike traditional log systems that index full log lines, Loki indexes only labels. This design makes it significantly cheaper to operate while still providing fast queries when you know what you're looking for.
Loki's label-based approach aligned well with our Prometheus-native metrics system. We could use the same labels across metrics and logs, making it easier to correlate data during troubleshooting. The native integration with Grafana meant we could query logs using LogQL directly in the same dashboards where we visualize metrics.
Extending the OpenTelemetry Collector
Rather than introducing a separate agent for log collection, we extended our existing OpenTelemetry Collector distribution. We added the journald receiver for log collection and configured the OTLP exporter to send logs to Loki.
This unified approach reduced complexity. We now have a single agent handling both metrics and logs, with one configuration system and one deployment process.
The journald receiver captures system logs from the systemd journal, including RabbitMQ and LavinMQ broker logs. We use processors to filter and enrich log entries with relevant labels before exporting them.
Log data flow
Logs flow from the collector to Grafana Loki using OTLP (OpenTelemetry Protocol). We query
logs using LogQL in Grafana's log explorer and use them for alerting rules that notify our team via Slack and PagerDuty.
Summary
By extending our OpenTelemetry Collector to handle logs, we unified our observability stack under a single agent. Grafana Loki provides cost-effective log storage that integrates seamlessly with our Prometheus-based metrics system.
In Part 3 we cover the full system architecture, including how we build our custom collector distribution, manage dynamic configuration, and the results we achieved.