Eric D. Schabell: Mastering Fluent Bit: Top 3 Telemetry Pipeline Processors for Developers

This series is a general purpose getting started guide for those of us wanting to learn about the Cloud Native Computing Foundation (CNCF) project Fluent Bit.

Each article in this series addresses a single topic by providing insights into what the topic is, why we are interested in exploring that topic, where to get started with the topic, and how to get hands-on with learning about the topic as it relates to the Fluent Bit project.

The idea is that each article can stand on its own, but that they also lead down a path that slowly increases our abilities to implement solutions with Fluent Bit telemetry pipelines.

Let's take a look at the topic of this article, using Fluent Bit processors for developers. In case you missed the previous article, check out the top tips on using telemetry pipeline parsers for developers where you get tips on cleaning up your telemetry data for better developer experiences.

This article will be a hands-on tour of the things that help you as a developer testing out your Fluent Bit pipelines. We'll take a look at the top 3 processors you'll want to know about when building your telemetry pipeline configurations in Fluent Bit.

All examples in this article have been done on OSX and are assuming the reader is able to convert the actions shown here to their own local machines.

Where to get started

You should have explored the previous articles in this series to install and get started with Fluent Bit on your developer local machine, either using the source code or container images. Links at the end of this article will point you to a free hands-on workshop that lets you explore more of Fluent Bit in detail.

You can verify that you have a functioning installation by testing your Fluent Bit, either using a source installation or a container installation as shown below:

# For source installation.

$ fluent-bit -i dummy -o stdout

# For container installation.

$ podman run -ti ghcr.io/fluent/fluent-bit:4.0.8 -i dummy -o stdout

...

[0] dummy.0: [[1753105021.031338000, {}], {"message"=>"dummy"}]
[0] dummy.0: [[1753105022.033205000, {}], {"message"=>"dummy"}]
[0] dummy.0: [[1753105023.032600000, {}], {"message"=>"dummy"}]
[0] dummy.0: [[1753105024.033517000, {}], {"message"=>"dummy"}]

...

Let's look at the top three processors that will help you with your local development testing of Fluent Bit pipelines.

Processing in a telemetry pipeline

See this article for details about the service section of the configurations used in the rest of this article, but for now we plan to focus on our Fluent Bit pipeline and specifically the processors that can be of great help in managing our telemetry data during testing in our inner developer loop.

Processors in Fluent Bit are powerful components that sit between the input and output phases of your telemetry pipeline. They allow you to manipulate, transform, and enrich your telemetry data before it reaches its destination. Unlike filters which operate on records, processors work on the raw data stream level, giving you fine-grained control over how your data flows through the pipeline.

The processor phase happens after data is ingested but before it's formatted for output. This makes processors ideal for operations that need to happen at scale across your entire data stream, such as content modification, metrics extraction, and data aggregation.

Keeping all of this in mind, let's look at the most interesting processors that developers will want to know more about.

1. Content Modifier processor

One of the most common use cases for telemetry pipelines that developers will encounter is the need to add, modify, or remove fields from your telemetry data. The Content Modifier processor gives you the ability to manipulate the structure and content of your events as they flow through the pipeline.

To provide an example we start with a simple Fluent Bit configuration file fluent-bit.yaml containing a configuration using the dummy plugin to generate events that we'll then modify:

service:

  flush: 1
  log_level: info
  http_server: on
  http_listen: 0.0.0.0
  http_port: 2020
  hot_reload: on

pipeline:
  inputs:
    - name: dummy
      tag: app.logs
      dummy: '{"environment":"dev","message":"Application started"}'
      
  processors:
    logs:
      - name: content_modifier
        action: insert
        key: pipeline_version
        value: "1.0.0"
        
      - name: content_modifier
        action: insert
        key: processed_timestamp
        value: "${HOSTNAME}"
        
      - name: content_modifier
        action: rename
        renamed_key: env
        key: environment
        
  outputs:
    - name: stdout
      match: '*'
      format: json_lines
      json_date_format: java_sql_timestamp

Our configuration uses the content_modifier processor three times to demonstrate different actions. First, we insert a new field called pipeline_version with a static value. Second, we insert a processed_timestamp field that references an environment variable. Third, we rename the environment field to env for consistency.

Let's run this to confirm our working test environment:

# For source installation.
$ fluent-bit --config fluent-bit.yaml

# For container installation after building new image with your 
# configuration using a Buildfile as follows:
#
# FROM ghcr.io/fluent/fluent-bit:4.1.0
# COPY ./fluent-bit.yaml /fluent-bit/etc/fluent-bit.yaml
# CMD [ "fluent-bit", "-c", "/fluent-bit/etc/fluent-bit.yaml" ]
#
$ podman build -t fb -f Buildfile

$ podman run --rm fb 

...
{"date":"2025-10-26 20:45:12.123456","env":"dev","message":"Application started","pipeline_version":"1.0.0","processed_timestamp":"localhost"}
{"date":"2025-10-26 20:45:13.234567","env":"dev","message":"Application started","pipeline_version":"1.0.0","processed_timestamp":"localhost"}
...

Note how each event now contains the additional fields we configured, and the original environment field has been renamed to env. This processor is invaluable for standardizing your telemetry data before it reaches your backend systems.

2. Metrics Selector processor

Another critical use case for developers working with telemetry data is the ability to extract and select specific metrics from your event streams. The Metrics Selector processor allows you to filter and route metrics based on their labels and values, giving you precise control over which metrics flow to which destinations.

To demonstrate this we'll create a configuration that generates different types of metrics and uses the metrics selector to route them appropriately:

service:

  flush: 1
  log_level: info
  http_server: on
  http_listen: 0.0.0.0
  http_port: 2020
  hot_reload: on

pipeline:
  inputs:
    - name: dummy
      tag: metrics.cpu
      dummy: '{"metric":"cpu_usage","value":75.5,"host":"server01","env":"production"}'
      
    - name: dummy
      tag: metrics.memory
      dummy: '{"metric":"memory_usage","value":82.3,"host":"server01","env":"production"}'
      
    - name: dummy
      tag: metrics.disk
      dummy: '{"metric":"disk_usage","value":45.2,"host":"server02","env":"staging"}'
      
  processors:
    logs:
      - name: metrics_selector
        metric_name: cpu_usage
        action: include
        label: env
        operation_type: prefix_match
        match: prod
        
  outputs:
    - name: stdout
      match: 'metrics.cpu'
      format: json_lines
      json_date_format: java_sql_timestamp
      
    - name: stdout
      match: 'metrics.*'
      format: json_lines
      json_date_format: java_sql_timestamp

Our configuration generates three different metric types and uses the metrics_selector processor to filter CPU metrics that match production environments. This allows you to create sophisticated routing rules based on your metric characteristics.

Let's run this configuration:

# For source installation.
$ fluent-bit --config fluent-bit.yaml

# For container installation after building new image with your 
# configuration using a Buildfile as follows:
#
# FROM ghcr.io/fluent/fluent-bit:4.1.0
# COPY ./fluent-bit.yaml /fluent-bit/etc/fluent-bit.yaml
# CMD [ "fluent-bit", "-c", "/fluent-bit/etc/fluent-bit.yaml" ]
#
$ podman build -t fb -f Buildfile

$ podman run --rm fb 

...
{"date":"2025-10-26 21:10:33.456789","metric":"cpu_usage","value":75.5,"host":"server01","env":"production"}
{"date":"2025-10-26 21:10:33.567890","metric":"memory_usage","value":82.3,"host":"server01","env":"production"}
{"date":"2025-10-26 21:10:33.678901","metric":"disk_usage","value":45.2,"host":"server02","env":"staging"}
...

The metrics selector processor helps you focus on the metrics that matter most during development and testing, reducing noise and improving the signal-to-noise ratio in your telemetry data.

3. OpenTelemetry Envelope processor

The third essential processor that developers need to understand is the OpenTelemetry Envelope processor. This processor transforms your Fluent Bit telemetry data into the OpenTelemetry protocol format, enabling seamless integration with the broader OpenTelemetry ecosystem. As organizations increasingly adopt OpenTelemetry as their standard for observability data, this processor becomes critical for ensuring your Fluent Bit pipelines can communicate effectively with OpenTelemetry collectors and backends.

The OpenTelemetry Envelope processor wraps your telemetry data in the standard OpenTelemetry format, preserving all the semantic conventions and structures that make OpenTelemetry powerful. This includes proper handling of resource attributes, instrumentation scope, and the telemetry signal types that are core to OpenTelemetry.

For comprehensive coverage of integrating Fluent Bit with OpenTelemetry, I highly recommend exploring these detailed articles:

Telemetry Pipelines: Integrating Fluent Bit with OpenTelemetry, Part 1 - This article covers the fundamentals of integrating Fluent Bit with OpenTelemetry, including configuration patterns and best practices for getting started.

Integrating Fluent Bit with OpenTelemetry, Part 2 - This follow-up article dives deeper into advanced integration scenarios, troubleshooting tips, and real-world use cases for production deployments.

To demonstrate how the OpenTelemetry Envelope processor works, let's create a configuration that wraps application logs in OpenTelemetry format:

service:

  flush: 1
  log_level: info
  http_server: on
  http_listen: 0.0.0.0
  http_port: 2020
  hot_reload: on

pipeline:
  inputs:
    - name: dummy
      tag: app.logs
      dummy: '{"level":"info","service":"user-api","message":"User login successful","user_id":"12345"}'
      
    - name: dummy
      tag: app.logs
      dummy: '{"level":"error","service":"payment-api","message":"Payment processing failed","transaction_id":"tx-9876"}'
      
  processors:
    logs:
      - name: opentelemetry_envelope
        resource:
          service_name: my-application
          service_version: 1.2.3
          deployment_environment: production
        instrumentation_scope:
          name: fluent-bit
          version: 4.2.0
        
  outputs:
    - name: stdout
      match: '*'
      format: json_lines
      json_date_format: java_sql_timestamp

Our configuration uses the opentelemetry_envelope processor to wrap each log entry with OpenTelemetry metadata. The resource section adds attributes that describe the source of the telemetry data, such as the service name and deployment environment. The instrumentation_scope section identifies the tool that collected the data, which is essential for proper attribution in OpenTelemetry systems.

Let's run this configuration to see the OpenTelemetry envelope in action:

# For source installation.
$ fluent-bit --config fluent-bit.yaml

# For container installation after building new image with your 
# configuration using a Buildfile as follows:
#
# FROM ghcr.io/fluent/fluent-bit:4.1.0
# COPY ./fluent-bit.yaml /fluent-bit/etc/fluent-bit.yaml
# CMD [ "fluent-bit", "-c", "/fluent-bit/etc/fluent-bit.yaml" ]
#
$ podman build -t fb -f Buildfile

$ podman run --rm fb 

...
{"date":"2025-10-26 22:15:30.123456","resource":{"service_name":"my-application","service_version":"1.2.3","deployment_environment":"production"},"instrumentation_scope":{"name":"fluent-bit","version":"4.1.0"},"level":"info","service":"user-api","message":"User login successful","user_id":"12345"}
{"date":"2025-10-26 22:15:31.234567","resource":{"service_name":"my-application","service_version":"1.2.3","deployment_environment":"production"},"instrumentation_scope":{"name":"fluent-bit","version":"4.1.0"},"level":"error","service":"payment-api","message":"Payment processing failed","transaction_id":"tx-9876"}
...

Notice how each log entry now includes the OpenTelemetry resource attributes and instrumentation scope information. This standardized format ensures that when your telemetry data reaches an OpenTelemetry collector or backend, it will be properly categorized and can be correlated with other telemetry signals like traces and metrics from your distributed system.

This covers the top 3 processors for developers getting started with Fluent Bit while trying to leverage processors to transform and enrich their telemetry data quickly and speed up their inner development loop.

More in the series

In this article you learned about three powerful Fluent Bit processors that improve the inner developer loop experience. This article is based on this online free workshop.

There will be more in this series as you continue to learn how to configure, run, manage, and master the use of Fluent Bit in the wild. Next up, exploring some of the more interesting Fluent Bit multiline parsers for developers.