Eric D. Schabell: Mastering Fluent Bit: Developers Guide to Routing Metrics to Prometheus

This series is a general purpose getting started guide for those of us wanting to learn about the Cloud Native Computing Foundation (CNCF) project Fluent Bit.

Each article in this series addresses a single topic by providing insights into what the topic is, why we are interested in exploring that topic, where to get started with the topic, and how to get hands-on with learning about the topic as it relates to the Fluent Bit project.

The idea is that each article can stand on its own, but that they also lead down a path that slowly increases our abilities to implement solutions with Fluent Bit telemetry pipelines.

Let's take a look at the topic of this article, integrating Fluent Bit with Prometheus. In case you missed the previous article, check out the developer guide to telemetry pipeline routing where you explore how to direct telemetry data to different destinations based on tags, patterns, and conditions.

This article will be a hands-on exploration of Prometheus integration that helps you as a developer leverage Fluent Bit's powerful metrics capabilities. We'll look at the first of three essential patterns for integrating Fluent Bit with Prometheus in your observability infrastructure.

All examples in this article have been done on OSX and are assuming the reader is able to convert the actions shown here to their own local machines.

Integrating with Prometheus?

Before diving into the hands-on examples, let's understand why Prometheus integration matters for Fluent Bit users. Prometheus is the de facto standard for metrics collection and monitoring in cloud native environments. It's another CNCF graduated project that provides a time-series database optimized for operational monitoring. The combination of Fluent Bit's lightweight, high-throughput telemetry pipeline with Prometheus's battle-tested metrics storage creates a powerful observability solution.

Fluent Bit provides several ways to integrate with Prometheus. You can expose metrics endpoints that Prometheus can scrape (pull model), push metrics directly to Prometheus using the remote write protocol, or even scrape existing Prometheus endpoints and route those metrics through your telemetry pipeline. This flexibility allows Fluent Bit to act as a metrics aggregator, forwarder, or even a replacement for dedicated metrics agents in resource-constrained environments.What is Prometheus Integration?

There are several compelling reasons to integrate Fluent Bit with Prometheus in your infrastructure. First, Fluent Bit can collect system-level metrics using its built-in Node Exporter Metrics plugin, eliminating the need to deploy a separate Prometheus Node Exporter. This reduces resource usage and simplifies your deployment.

Second, Fluent Bit can monitor itself and expose internal pipeline metrics, giving you visibility into the health and performance of your telemetry infrastructure. Understanding how your telemetry pipeline is performing is critical for maintaining reliable observability. This will be covered in a future article.

Third, Fluent Bit can act as a metrics proxy, scraping metrics from various sources and forwarding them to Prometheus. This is particularly useful when you need to aggregate metrics from multiple sources or transform them before they reach Prometheus. This will be explored in a future article.

Let's dive into the first pattern, collecting system-level metrics using its built-in Node Exporter Metrics plugin.

Where to get started

You should have explored the previous articles in this series to install and get started with Fluent Bit on your developer local machine, either using the source code or container images. Links at the end of this article will point you to a free hands-on workshop that lets you explore more of Fluent Bit in detail.

You can verify that you have a functioning installation by testing your Fluent Bit, either using a source installation or a container installation as shown below:

# For source installation.

$ fluent-bit -i dummy -o stdout

# For container installation.

$ podman run -ti ghcr.io/fluent/fluent-bit:4.2.2 -i dummy -o stdout

...

[0] dummy.0: [[1753105021.031338000, {}], {"message"=>"dummy"}]
[0] dummy.0: [[1753105022.033205000, {}], {"message"=>"dummy"}]
[0] dummy.0: [[1753105023.032600000, {}], {"message"=>"dummy"}]
[0] dummy.0: [[1753105024.033517000, {}], {"message"=>"dummy"}]

...

Let's explore the three Prometheus integration patterns that will help you with your observability infrastructure.

How to integrate with Prometheus

See this article for details about the service section of the configurations used in the rest of this article, but for now we plan to focus on our Fluent Bit pipeline and specifically the Prometheus integration capabilities that can be of great help in managing metrics in your observability stack.

Below in the figure you see the phases of a telemetry pipeline. Metrics collected by input plugins flow through the pipeline and can be routed to Prometheus-compatible outputs.

Understanding how metrics flow through Fluent Bit's pipeline is essential for effective Prometheus integration. Input plugins collect metrics, which then pass through filters for transformation, before being routed to output plugins that deliver metrics to Prometheus.

Now let's look at the first of three most useful Prometheus integration patterns that developers will want to know about.

Routing metrics through Fluent Bit to Prometheus

The first integration pattern involves collecting host-level metrics using Fluent Bit's built-in Node Exporter Metrics plugin and exposing them for Prometheus to scrape. This pattern is incredibly valuable because it allows you to collect system metrics without deploying a separate Prometheus Node Exporter agent.

The Node Exporter Metrics input plugin implements a subset of the collectors available in the original Prometheus Node Exporter. It collects CPU statistics, memory usage, disk I/O, network interface statistics, filesystem information, and more. The beauty of this approach is that all these metrics flow through Fluent Bit's pipeline, where you can transform, filter, and route them as needed.

To demonstrate this pattern, let's create a configuration file called fluent-bit.yaml that collects host metrics and exposes them through a Prometheus endpoint:

service:

  flush: 1
  log_level: info
  http_server: on
  http_listen: 0.0.0.0
  http_port: 2020
  hot_reload: on

pipeline:
  inputs:
    - name: node_exporter_metrics
      tag: node_metrics
      scrape_interval: 2

  outputs:
    - name: prometheus_exporter
      match: node_metrics
      host: 0.0.0.0
      port: 2021
      add_label:
        - app fluent-bit
        - environment development

    # testing output to console

    - name: stdout
      match: node_metrics
      format: json_lines

Our configuration uses the node_exporter_metrics input plugin to collect system metrics every two seconds. The prometheus_exporter output plugin then exposes these metrics on port 2021 in a format that Prometheus can scrape. We've also added the custom labels app and environment that will be attached to all metrics, making it easier to filter and query them in Prometheus.

Let's run this configuration as follows:

# For source installation.
$ fluent-bit --config fluent-bit.yaml

# For container installation after building new image with your 
# configuration using a Buildfile as follows:
#
# FROM ghcr.io/fluent/fluent-bit:4.2.2
# COPY ./fluent-bit.yaml /fluent-bit/etc/fluent-bit.yaml
# CMD [ "fluent-bit", "-c", "/fluent-bit/etc/fluent-bit.yaml" ]
#
$ podman build -t fb -f Buildfile

# Note: For container deployments collecting linux host metrics, you need
# to mount the host's /proc and /sys filesystems:
# $ podman run --rm -v /proc:/host/proc:ro -v /sys:/host/sys:ro -p 2021:2021 fb

$ podman run --rm fb 

...
[2026/01/19 15:25:47.115361000] [ warn] [input:node_exporter_metrics:node_exporter_metrics.0] calling IORegistryEntryGetChildEntry is failed
2026-01-19T14:25:47.114513143Z node_cpu_seconds_total{cpu="0",mode="user"} = 25039.200000000001
2026-01-19T14:25:47.114513143Z node_cpu_seconds_total{cpu="0",mode="system"} = 9067.2999999999993
2026-01-19T14:25:47.114513143Z node_cpu_seconds_total{cpu="0",mode="nice"} = 0
2026-01-19T14:25:47.114513143Z node_cpu_seconds_total{cpu="0",mode="idle"} = 48662.790000000001
2026-01-19T14:25:47.114513143Z node_cpu_seconds_total{cpu="1",mode="user"} = 23096.860000000001
2026-01-19T14:25:47.114513143Z node_cpu_seconds_total{cpu="1",mode="system"} = 7764.7299999999996
2026-01-19T14:25:47.114513143Z node_cpu_seconds_total{cpu="1",mode="nice"} = 0
2026-01-19T14:25:47.114513143Z node_cpu_seconds_total{cpu="1",mode="idle"} = 52016.459999999999
2026-01-19T14:25:47.114513143Z node_cpu_seconds_total{cpu="2",mode="user"} = 20056.130000000001
2026-01-19T14:25:47.114513143Z node_cpu_seconds_total{cpu="2",mode="system"} = 6364.9700000000003
2026-01-19T14:25:47.114513143Z node_cpu_seconds_total{cpu="2",mode="nice"} = 0
2026-01-19T14:25:47.114513143Z node_cpu_seconds_total{cpu="2",mode="idle"} = 56597.839999999997
2026-01-19T14:25:47.114513143Z node_cpu_seconds_total{cpu="3",mode="user"} = 17696.98
2026-01-19T14:25:47.114513143Z node_cpu_seconds_total{cpu="3",mode="system"} = 5385.8999999999996
2026-01-19T14:25:47.114513143Z node_cpu_seconds_total{cpu="3",mode="nice"} = 0
2026-01-19T14:25:47.114513143Z node_cpu_seconds_total{cpu="3",mode="idle"} = 60055.519999999997
2026-01-19T14:25:47.114513143Z node_cpu_seconds_total{cpu="4",mode="user"} = 412.75
2026-01-19T14:25:47.114513143Z node_cpu_seconds_total{cpu="4",mode="system"} = 116.18000000000001
...

Note that the warning entry is an OSX specific issue with the node_exporter_metrics input plugin. The node_exporter_metrics plugin tries to collect system metrics similar to Prometheus Node Exporter. On OSX, it uses Apple's IOKit framework to access hardware information through the IORegistry (a hierarchical database of hardware devices).

Our console output for testing shows all the available metrics about this machine that are being collected every 1s. This gives us something to work with and query once it's sent to a Prometheus backend.

Now verify the metrics are being tagged with custom labels by opening a browser window to http://localhost:2021/metrics and we should see the following:

# HELP node_cpu_seconds_total Seconds the CPUs spent in each mode.
# TYPE node_cpu_seconds_total counter
node_cpu_seconds_total{app="fluent-bit",environment="development",cpu="0",mode="user"} 25095.049999999999
node_cpu_seconds_total{app="fluent-bit",environment="development",cpu="0",mode="system"} 9092.3299999999999
node_cpu_seconds_total{app="fluent-bit",environment="development",cpu="0",mode="nice"} 0
node_cpu_seconds_total{app="fluent-bit",environment="development",cpu="0",mode="idle"} 48874.290000000001
node_cpu_seconds_total{app="fluent-bit",environment="development",cpu="1",mode="user"} 23145.150000000001
node_cpu_seconds_total{app="fluent-bit",environment="development",cpu="1",mode="system"} 7784.9700000000003
node_cpu_seconds_total{app="fluent-bit",environment="development",cpu="1",mode="nice"} 0
node_cpu_seconds_total{app="fluent-bit",environment="development",cpu="1",mode="idle"} 52240.839999999997
node_cpu_seconds_total{app="fluent-bit",environment="development",cpu="2",mode="user"} 20091.23
node_cpu_seconds_total{app="fluent-bit",environment="development",cpu="2",mode="system"} 6379.5500000000002
node_cpu_seconds_total{app="fluent-bit",environment="development",cpu="2",mode="nice"} 0
node_cpu_seconds_total{app="fluent-bit",environment="development",cpu="2",mode="idle"} 56841.639999999999
node_cpu_seconds_total{app="fluent-bit",environment="development",cpu="3",mode="user"} 17723.43
node_cpu_seconds_total{app="fluent-bit",environment="development",cpu="3",mode="system"} 5396.7299999999996
node_cpu_seconds_total{app="fluent-bit",environment="development",cpu="3",mode="nice"} 0
node_cpu_seconds_total{app="fluent-bit",environment="development",cpu="3",mode="idle"} 60312.18
node_cpu_seconds_total{app="fluent-bit",environment="development",cpu="4",mode="user"} 412.89999999999998
node_cpu_seconds_total{app="fluent-bit",environment="development",cpu="4",mode="system"} 116.25
...

Notice how the metrics include our custom labels app="fluent-bit" and environment="development". These labels are automatically added to every metric by the Prometheus exporter output plugin, making it easy to identify and filter metrics in your Prometheus queries.

To integrate this with Prometheus, add a scrape configuration to your Prometheus configuration file prometheus.yml as follows:

scrape_configs:
  - job_name: 'fluent-bit-node-metrics'
    static_configs:
      - targets: ['localhost:2021']
    scrape_interval: 10s

This configuration tells Prometheus to scrape the Fluent Bit metrics endpoint every 10 seconds. The metrics will then be available for querying in Prometheus and can be visualized in the Prometheus console or using the Perses project for dashboards.

The Node Exporter Metrics plugin supports numerous collectors including CPU, disk I/O, filesystem, load average, memory, network interface, and more. You can selectively enable or disable collectors based on your monitoring needs, and set individual scrape intervals for each collector type.

It's left to the reader to run their own Prometheus instance with this configuration and to explore the collected metrics telemetry data. A primer to do this if you need help can be found in this hands-on free online Prometheus workshop.

More in the series

In this article you explored the first of three powerful patterns for integrating Fluent Bit with Prometheus: collecting and exposing host metrics. In the following article we will continue onwards to look at monitoring Fluent Bit's internal pipeline health and using Fluent Bit as a metrics proxy with remote write capabilities. This article is based on this online free workshop.

There will be more in this series as you continue to learn how to configure, run, manage, and master the use of Fluent Bit in the wild. Next up, we'll explore monitoring Fluent Bit's internal pipeline health.