Eric D. Schabell: Mastering Fluent Bit: Developers Guide to Monitoring Health Metrics with Prometheus

This series is a general purpose getting started guide for those of us wanting to learn about the Cloud Native Computing Foundation (CNCF) project Fluent Bit.

Each article in this series addresses a single topic by providing insights into what the topic is, why we are interested in exploring that topic, where to get started with the topic, and how to get hands-on with learning about the topic as it relates to the Fluent Bit project.

The idea is that each article can stand on its own, but that they also lead down a path that slowly increases our abilities to implement solutions with Fluent Bit telemetry pipelines.

Let's take a look at the topic of this article, integrating Fluent Bit with Prometheus. In case you missed the previous article, check out the developer guide to routing metrics to Prometheus where you explore how to collect and direct metrics telemetry data to your Prometheus instance.

This article will continue our hands-on exploration of Prometheus integration, helping developers leverage Fluent Bit's powerful metrics capabilities. We'll look at the second of three essential patterns for integrating Fluent Bit with Prometheus in your observability infrastructure.

All examples in this article have been done on OSX and are assuming the reader is able to convert the actions shown here to their own local machines.

Integrating with Prometheus?

Before diving into the hands-on examples, let's understand why Prometheus integration matters for Fluent Bit users. Prometheus is the de facto standard for metrics collection and monitoring in cloud native environments. It's another CNCF graduated project that provides a time-series database optimized for operational monitoring. The combination of Fluent Bit's lightweight, high-throughput telemetry pipeline with Prometheus's battle-tested metrics storage creates a powerful observability solution.

Fluent Bit provides several ways to integrate with Prometheus, the first of which we covered in the previous article. In this article we'll explore Fluent Bit monitoring itself and exposing internal pipeline metrics, giving you visibility into the health and performance of your telemetry infrastructure. Understanding how your telemetry pipeline is performing is critical for maintaining reliable observability.

The third and final way to integrate with Prometheus is to use Fluent Bit can act as a metrics proxy, scraping metrics from various sources and forwarding them to Prometheus. This is particularly useful when you need to aggregate metrics from multiple sources or transform them before they reach Prometheus. This will be explored in a future article.

Let's dive into the second pattern, exposing internal telemetry pipeline metrics to Prometheus.

Where to get started

You should have explored the previous articles in this series to install and get started with Fluent Bit on your developer local machine, either using the source code or container images. Links at the end of this article will point you to a free hands-on workshop that lets you explore more of Fluent Bit in detail.

You can verify that you have a functioning installation by testing your Fluent Bit, either using a source installation or a container installation as shown below:

# For source installation.

$ fluent-bit -i dummy -o stdout

# For container installation.

$ podman run -ti ghcr.io/fluent/fluent-bit:4.2.2 -i dummy -o stdout

...

[0] dummy.0: [[1753105021.031338000, {}], {"message"=>"dummy"}]
[0] dummy.0: [[1753105022.033205000, {}], {"message"=>"dummy"}]
[0] dummy.0: [[1753105023.032600000, {}], {"message"=>"dummy"}]
[0] dummy.0: [[1753105024.033517000, {}], {"message"=>"dummy"}]

...

Let's explore the Prometheus integration pattern for monitoring Fluent Bit metrics that will help you with your observability infrastructure.

How to integrate with Prometheus

See this article for details about the service section of the configurations used in the rest of this article, but for now we plan to focus on our Fluent Bit pipeline and specifically the Prometheus integration capabilities that can be of great help in managing metrics in your observability stack.

Below in the figure you see the phases of a telemetry pipeline. Metrics collected by input plugins flow through the pipeline and can be routed to Prometheus-compatible outputs.

Understanding how metrics flow through Fluent Bit's pipeline is essential for effective Prometheus integration. Input plugins collect metrics, which then pass through filters for transformation, before being routed to output plugins that deliver metrics to Prometheus.

Monitoring telemetry pipeline health

The second integration pattern focuses on monitoring the health and performance of your Fluent Bit instance. Understanding how your telemetry pipeline is performing is critical for maintaining reliable observability infrastructure. Fluent Bit provides the fluentbit_metrics input plugin that exposes internal pipeline metrics.

The Fluent Bit Metrics plugin collects valuable information about your pipeline including uptime, input plugin throughput, output plugin performance, buffer usage, and error counts. These metrics help you answer questions like: Is my pipeline processing data fast enough? Are there any backpressure issues? How much memory is my pipeline consuming?

To demonstrate this pattern, let's create a configuration file called fluent-bit.yaml that exposes internal Fluent Bit metrics:

service:

  flush: 1
  log_level: info
  http_server: on
  http_listen: 0.0.0.0
  http_port: 2020
  hot_reload: on

pipeline:
  inputs:
    # Simulate application logs for the pipeline to process
    - name: dummy
      tag: app.logs
      dummy: '{"message":"Application log entry","level":"info"}'

    # Collect Fluent Bit internal metrics
    - name: fluentbit_metrics
      tag: internal_metrics
      scrape_interval: 2

  outputs:
    # Send application logs to stdout
    - name: stdout
      match: app.logs
      format: json_lines

    # Expose internal metrics for Prometheus
    - name: prometheus_exporter
      match: internal_metrics
      host: 0.0.0.0
      port: 2022
      add_label:
        - instance fluent-bit-dev
        - component telemetry-pipeline

This configuration demonstrates a realistic scenario where Fluent Bit is processing application logs while simultaneously exposing its own internal metrics. The fluentbit_metrics input plugin collects metrics about the pipeline itself, and these are exposed on a separate port (2022) from any host metrics we might be collecting.

Let's run this configuration as follows:

# For source installation.
$ fluent-bit --config fluent-bit.yaml

# For container installation after building new image with your 
# configuration using a Buildfile as follows:
#
# FROM ghcr.io/fluent/fluent-bit:4.2.2
# COPY ./fluent-bit.yaml /fluent-bit/etc/fluent-bit.yaml
# CMD [ "fluent-bit", "-c", "/fluent-bit/etc/fluent-bit.yaml" ]
#
$ podman build -t fb -f Buildfile

# Note: For container deployments collecting linux host metrics, you need
# to mount the host's /proc and /sys filesystems:
# $ podman run --rm -v /proc:/host/proc:ro -v /sys:/host/sys:ro -p 2021:2021 fb

$ podman run --rm fb 

...
{"date":1768901032.490993,"message":"Application log entry","level":"info"}
{"date":1768901033.490052,"message":"Application log entry","level":"info"}
{"date":1768901034.488256,"message":"Application log entry","level":"info"}
...

Our console output shows the generated log message and in the background, being staged on the endpoint for a Prometheus instance to scrap are our Fluent Bit instance metrics.

Now verify the metrics are being tagged with custom labels by opening a browser window to http://localhost:2022/metrics and we should see the following:

# HELP fluentbit_uptime Number of seconds that Fluent Bit has been running.
# TYPE fluentbit_uptime counter
fluentbit_uptime{instance="fluent-bit-dev",component="telemetry-pipeline",hostname="Erics-MacBook-Pro.local"} 126
# HELP fluentbit_logger_logs_total Total number of logs
# TYPE fluentbit_logger_logs_total counter
fluentbit_logger_logs_total{instance="fluent-bit-dev",component="telemetry-pipeline",message_type="error"} 0
fluentbit_logger_logs_total{instance="fluent-bit-dev",component="telemetry-pipeline",message_type="warn"} 0
fluentbit_logger_logs_total{instance="fluent-bit-dev",component="telemetry-pipeline",message_type="info"} 14
fluentbit_logger_logs_total{instance="fluent-bit-dev",component="telemetry-pipeline",message_type="debug"} 0
fluentbit_logger_logs_total{instance="fluent-bit-dev",component="telemetry-pipeline",message_type="trace"} 0
# HELP fluentbit_routing_logs_records_total Total log records routed from input to output
# TYPE fluentbit_routing_logs_records_total counter
fluentbit_routing_logs_records_total{instance="fluent-bit-dev",component="telemetry-pipeline",input="dummy.0",output="stdout.0"} 125
# HELP fluentbit_routing_logs_bytes_total Total bytes routed from input to output (logs)
# TYPE fluentbit_routing_logs_bytes_total counter
...

These metrics provide valuable insights into your pipeline's health. The fluentbit_uptime counter tells you how long your instance has been running. The fluentbit_input_records_total and fluentbit_input_bytes_total counters show throughput for each input plugin. The fluentbit_output_records_total, fluentbit_output_retries_total, and fluentbit_output_errors_total counters help you monitor output plugin performance and detect delivery issues.

To integrate this with Prometheus, add a scrape configuration to your Prometheus configuration file prometheus.yml as follows:

scrape_configs:
  - job_name: 'fluent-bit-health-metrics'
    static_configs:
      - targets: ['localhost:2022']
    scrape_interval: 10s

This configuration tells Prometheus to scrape the Fluent Bit metrics endpoint every 10 seconds. The metrics will then be available for querying in Prometheus and can be visualized in the Prometheus console or using the Perses project for dashboards.

With these metrics in Prometheus, you can create alerts for scenarios like high retry rates (indicating backpressure or destination issues), zero input records (indicating data collection problems), or high memory usage.

It's left to the reader to explore their metrics telemetry data with their own Prometheus instance and to browse through the collected metrics telemetry data. A primer to do this if you need help can be found in this hands-on free online Prometheus workshop.

More in the series

In this article you explored the second of three powerful patterns for integrating Fluent Bit with Prometheus: monitoring Fluent Bit health metrics. In the following article we will continue onwards to look at scraping and forwarding Prometheus metrics with remote write. This article is based on this online free workshop.

There will be more in this series as you continue to learn how to configure, run, manage, and master the use of Fluent Bit in the wild. Next up, forwarding Prometheus metrics.