Eric D. Schabell: Mastering Fluent Bit: Top 3 Telemetry Pipeline Filters for Developers

Wednesday, November 26, 2025

Mastering Fluent Bit: Top 3 Telemetry Pipeline Filters for Developers

Mastering Fluent Bit Blog Series

This series is a general purpose getting started guide for those of us wanting to learn about the Cloud Native Computing Foundation (CNCF) project Fluent Bit.

Each article in this series addresses a single topic by providing insights into what the topic is, why we are interested in exploring that topic, where to get started with the topic, and how to get hands-on with learning about the topic as it relates to the Fluent Bit project.

The idea is that each article can stand on its own, but that they also lead down a path that slowly increases our abilities to implement solutions with Fluent Bit telemetry pipelines.

Let's take a look at the topic of this article, using Fluent Bit filters for developers. In case you missed the previous article, check out three tips for using telemetry pipeline multiline parsers where you explore how to handle complex multiline log messages.

This article will be a hands-on exploration of filters that help you as a developer testing out your Fluent Bit pipelines. We'll take a look at the top three filters you'll want to know about when building your telemetry pipeline configurations in Fluent Bit.

All examples in this article have been done on OSX and are assuming the reader is able to convert the actions shown here to their own local machines.

Where to get started

You should have explored the previous articles in this series to install and get started with Fluent Bit on your developer local machine, either using the source code or container images. Links at the end of this article will point you to a free hands-on workshop that lets you explore more of Fluent Bit in detail.

You can verify that you have a functioning installation by testing your Fluent Bit, either using a source installation or a container installation as shown below:

# For source installation.
$ fluent-bit -i dummy -o stdout

# For container installation.
$ podman run -ti ghcr.io/fluent/fluent-bit:4.0.8 -i dummy -o stdout

...
[0] dummy.0: [[1753105021.031338000, {}], {"message"=>"dummy"}]
[0] dummy.0: [[1753105022.033205000, {}], {"message"=>"dummy"}]
[0] dummy.0: [[1753105023.032600000, {}], {"message"=>"dummy"}]
[0] dummy.0: [[1753105024.033517000, {}], {"message"=>"dummy"}]
...

Let's look at the top three filters that will help you with your local development testing of Fluent Bit pipelines.

Filtering in a telemetry pipeline

See this article for details about the service section of the configurations used in the rest of this article, but for now we plan to focus on our Fluent Bit pipeline and specifically the filters that can be of great help in managing our telemetry data during testing in our inner developer loop.

Below in the figure you see the phases of a telemetry pipeline. The third phase is filter, which is where we can modify, enrich, or drop records based on specific criteria.

Filters in Fluent Bit are powerful tools that operate on records after they've been parsed but before they reach their destination. Unlike processors that work on raw data streams, filters work on structured records, giving you the ability to manipulate individual fields, add metadata, remove sensitive information, or exclude records entirely based on conditions.

In production environments you need full control of the data you're collecting. Filtering lets you alter the collected data before delivering it to a destination. Each available filter can be used to match, exclude, or enrich your logs with specific metadata. Fluent Bit supports many filters, and understanding the most useful ones will dramatically improve your development experience.

Now let's look at the most interesting filters that developers will want to know more about.

1. Modify filter

One of the most versatile filters for telemetry pipelines that developers will encounter is the Modify filter. The Modify filter allows you to change records using rules and conditions, giving you the power to add new fields, rename existing ones, remove unwanted data, and conditionally manipulate your telemetry based on specific criteria.

To provide an example we start by creating a test configuration file called fluent-bit.yaml that demonstrates the Modify filter's capabilities:

service:
  flush: 1
  log_level: info
  http_server: on
  http_listen: 0.0.0.0
  http_port: 2020
  hot_reload: on

pipeline:
  inputs:
    - name: dummy
      tag: app.logs
      dummy: '{"environment":"dev","level":"info","message":"Application started","memory_mb":512}'

  filters:
    - name: modify
      match: '*'
      add:
        - service_name my-application
        - version 1.2.3
        - processed true
      rename:
        - environment env
        - memory_mb mem_usage
      remove:
        - level

  outputs:
    - name: stdout
      match: '*'
      format: json_lines

Our configuration uses the modify filter with several different operations. The add operation inserts new fields into the record. This is extremely useful for adding metadata that your observability backend expects, such as service names, versions, or deployment information. The rename operation changes field names to match your preferred naming conventions or to comply with backend requirements. The remove operation strips out fields you don't want to send to your destination, which can reduce storage costs and improve query performance.

Let's run this configuration to see the Modify filter in action:

# For source installation.
$ fluent-bit --config fluent-bit.yaml

# For container installation after building new image with your 
# configuration using a Buildfile as follows:
#
# FROM ghcr.io/fluent/fluent-bit:4.1.0
# COPY ./fluent-bit.yaml /fluent-bit/etc/fluent-bit.yaml
# CMD [ "fluent-bit", "-c", "/fluent-bit/etc/fluent-bit.yaml" ]
#
$ podman build -t fb -f Buildfile

$ podman run --rm fb 

...
{"date":"2025-12-05 14:23:45.678901","env":"dev","message":"Application started","mem_usage":512,"service_name":"my-application","version":"1.2.3","processed":"true"}
{"date":"2025-12-05 14:23:46.789012","env":"dev","message":"Application started","mem_usage":512,"service_name":"my-application","version":"1.2.3","processed":"true"}
...

Notice how the output has been transformed? 

The original environment field is now envmemory_mb is now mem_usage, the level field has been removed entirely, and we've added three new fields: service_nameversion, and processed. This kind of transformation is essential when you're working with multiple services that produce logs in different formats but need to be standardized before sending to your observability backend.

The Modify filter also supports conditional operations using the Condition parameter. This allows you to apply modifications only when specific criteria are met. Let's extend our example to demonstrate conditional modifications:

service:
  flush: 1
  log_level: info
  http_server: on
  http_listen: 0.0.0.0
  http_port: 2020
  hot_reload: on

pipeline:
  inputs:
    - name: dummy
      tag: app.logs
      dummy: '{"environment":"production","level":"error","message":"Database connection failed","response_time":5000}'

    - name: dummy
      tag: app.logs
      dummy: '{"environment":"dev","level":"info","message":"Request processed","response_time":150}'

  filters:
    - name: modify
      match: '*'
      condition:
        - key_value_equals environment production
      add:
        - priority high
        - alert true

    - name: modify
      match: '*'
      condition:
        - key_value_equals level error
      add:
        - severity critical

  outputs:
    - name: stdout
      match: '*'
      format: json_lines

Let's run this configuration:

# For source installation.
$ fluent-bit --config fluent-bit.yaml

# For container installation after building new image with your 
# configuration using a Buildfile as follows:
#
# FROM ghcr.io/fluent/fluent-bit:4.1.0
# COPY ./fluent-bit.yaml /fluent-bit/etc/fluent-bit.yaml
# CMD [ "fluent-bit", "-c", "/fluent-bit/etc/fluent-bit.yaml" ]
#
$ podman build -t fb -f Buildfile

$ podman run --rm fb 

...
{"date":"2025-12-05 14:30:12.345678","environment":"production","level":"error","message":"Database connection failed","response_time":5000,"priority":"high","alert":"true","severity":"critical"}
{"date":"2025-12-05 14:30:13.456789","environment":"dev","level":"info","message":"Request processed","response_time":150}
...

The first record matches both conditions (production environment AND error level), so it gets priorityalert, and severity fields added. The second record doesn't match any conditions, so it passes through unchanged. 

This conditional logic is incredibly powerful for implementing routing rules, prioritizing certain types of logs, or adding context based on the content of your telemetry data.

2. Grep filter

Another essential filter that developers need in their telemetry toolkit is the Grep filter. The Grep filter allows you to match or exclude specific records based on regular expression patterns, giving you fine-grained control over which events flow through your pipeline. This is particularly useful during development when you want to focus on specific types of logs or exclude noisy events that aren't relevant to your current debugging session.

To demonstrate the power of the Grep filter, let's create a configuration that filters application logs:

service:
  flush: 1
  log_level: info
  http_server: on
  http_listen: 0.0.0.0
  http_port: 2020
  hot_reload: on

pipeline:
  inputs:
    - name: dummy
      tag: app.logs
      dummy: '{"level":"DEBUG","message":"Processing request 12345","service":"api"}'

    - name: dummy
      tag: app.logs
      dummy: '{"level":"ERROR","message":"Failed to connect to database","service":"api"}'

    - name: dummy
      tag: app.logs
      dummy: '{"level":"INFO","message":"Request completed successfully","service":"api"}'

    - name: dummy
      tag: app.logs
      dummy: '{"level":"WARN","message":"High memory usage detected","service":"api"}'

  filters:
    - name: grep
      match: '*'
      regex:
        - level ERROR|WARN

  outputs:
    - name: stdout
      match: '*'
      format: json_lines

Our configuration uses the grep filter with a regex parameter to keep only records where the level field matches either ERROR or WARN. This kind of filtering is invaluable when you're troubleshooting production issues and need to focus on problematic events while ignoring routine informational logs.

Let's run this configuration:

# For source installation.
$ fluent-bit --config fluent-bit.yaml

# For container installation after building new image with your 
# configuration using a Buildfile as follows:
#
# FROM ghcr.io/fluent/fluent-bit:4.1.0
# COPY ./fluent-bit.yaml /fluent-bit/etc/fluent-bit.yaml
# CMD [ "fluent-bit", "-c", "/fluent-bit/etc/fluent-bit.yaml" ]
#
$ podman build -t fb -f Buildfile

$ podman run --rm fb 

...
{"date":"2025-12-05 15:10:23.456789","level":"ERROR","message":"Failed to connect to database","service":"api"}
{"date":"2025-12-05 15:10:24.567890","level":"WARN","message":"High memory usage detected","service":"api"}
...

Notice that only the ERROR and WARN level logs appear in the output. The DEBUG and INFO logs have been filtered out completely. This dramatically reduces the volume of logs you need to process during development and testing.

The Grep filter also supports excluding records using the exclude parameter. Let's modify our configuration to demonstrate this:

service:
  flush: 1
  log_level: info
  http_server: on
  http_listen: 0.0.0.0
  http_port: 2020
  hot_reload: on

pipeline:
  inputs:
    - name: dummy
      tag: app.logs
      dummy: '{"message":"User login successful","user":"alice@example.com"}'

    - name: dummy
      tag: app.logs
      dummy: '{"message":"Health check passed","endpoint":"/health"}'

    - name: dummy
      tag: app.logs
      dummy: '{"message":"Database query executed","query":"SELECT * FROM users"}'

    - name: dummy
      tag: app.logs
      dummy: '{"message":"Metrics endpoint called","endpoint":"/metrics"}'

  filters:
    - name: grep
      match: '*'
      exclude:
        - message /health|/metrics

  outputs:
    - name: stdout
      match: '*'
      format: json_lines

Let's run this updated configuration:

# For source installation.
$ fluent-bit --config fluent-bit.yaml

# For container installation after building new image with your 
# configuration using a Buildfile as follows:
#
# FROM ghcr.io/fluent/fluent-bit:4.1.0
# COPY ./fluent-bit.yaml /fluent-bit/etc/fluent-bit.yaml
# CMD [ "fluent-bit", "-c", "/fluent-bit/etc/fluent-bit.yaml" ]
#
$ podman build -t fb -f Buildfile

$ podman run --rm fb 

...
{"date":"2025-12-05 15:20:34.567890","message":"User login successful","user":"alice@example.com"}
{"date":"2025-12-05 15:20:35.678901","message":"Database query executed","query":"SELECT * FROM users"}
...

The health check and metrics endpoint logs have been excluded from the output. This is extremely useful for filtering out routine monitoring traffic that generates high volumes of logs but provides little value during debugging. By combining regex to include specific patterns and exclude to filter out unwanted patterns, you can create sophisticated filtering rules that give you exactly the logs you need.

An important note about the Grep filter is that it supports matching nested fields using the record accessor format. For example, if you have JSON logs with nested structures like {"kubernetes":{"pod_name":"my-app-123"}}, you can use $kubernetes['pod_name'] as the key to match against nested values.

3. Record Modifier filter

The third essential filter for developers is the Record Modifier filter. While the Modify filter focuses on adding, renaming, and removing fields using static values, the Record Modifier filter excels at appending fields with dynamic values such as environment variables and removing or allowing specific keys using pattern matching. This makes it ideal for injecting runtime context into your logs.

Let's create a configuration that demonstrates the Record Modifier filter:

service:
  flush: 1
  log_level: info
  http_server: on
  http_listen: 0.0.0.0
  http_port: 2020
  hot_reload: on

pipeline:
  inputs:
    - name: dummy
      tag: app.logs
      dummy: '{"message":"Application event","request_id":"req-12345","response_time":250,"internal_debug":"sensitive data","trace_id":"trace-abc"}'

  filters:
    - name: record_modifier
      match: '*'
      record:
        - hostname ${HOSTNAME}
        - pod_name ${POD_NAME}
        - namespace ${NAMESPACE}
      remove_key:
        - internal_debug

  outputs:
    - name: stdout
      match: '*'
      format: json_lines

Our configuration uses the record_modifier filter with several powerful features. The record parameter adds new fields with values from environment variables. This is incredibly useful in containerized environments where hostname, pod names, and namespace information are available as environment variables but need to be injected into your logs for proper correlation and filtering in your observability backend. The remove_key parameter strips out sensitive fields that shouldn't be sent to your logging destination.

Let's run this configuration with some environment variables set:

# For source installation.
$ fluent-bit --config fluent-bit.yaml

# For container installation after building new image with your 
# configuration using a Buildfile as follows:
#
# FROM ghcr.io/fluent/fluent-bit:4.1.0
# COPY ./fluent-bit.yaml /fluent-bit/etc/fluent-bit.yaml
# CMD [ "fluent-bit", "-c", "/fluent-bit/etc/fluent-bit.yaml" ]
#
$ podman build -t fb -f Buildfile

$ podman run --rm \
  -e HOSTNAME=dev-server-01 \
  -e POD_NAME=my-app-pod-abc123 \
  -e NAMESPACE=production \
  fb

...
{"date":"2025-12-05 16:15:45.678901","message":"Application event","request_id":"req-12345","response_time":250,"trace_id":"trace-abc","hostname":"dev-server-01","pod_name":"my-app-pod-abc123","namespace":"production"}
...

Notice how the environment variables have been injected into the log record, and the internal_debug field has been removed. This pattern is essential for enriching your logs with contextual information that helps you understand where the logs originated in your distributed system.

The Record Modifier filter also supports the allowlist_key parameter (and its legacy alias whitelist_key), which works inversely to remove_key. Instead of specifying which fields to remove, you specify which fields to keep, and all others are removed:

service:
  flush: 1
  log_level: info
  http_server: on
  http_listen: 0.0.0.0
  http_port: 2020
  hot_reload: on

pipeline:
  inputs:
    - name: dummy
      tag: app.logs
      dummy: '{"message":"User action","user_id":"12345","email":"user@example.com","password_hash":"abc123","session_token":"xyz789","action":"login","timestamp":"2025-12-05T16:20:00Z"}'

  filters:
    - name: record_modifier
      match: '*'
      allowlist_key:
        - message
        - user_id
        - action
        - timestamp

  outputs:
    - name: stdout
      match: '*'
      format: json_lines

Let's run this configuration:

# For source installation.
$ fluent-bit --config fluent-bit.yaml

# For container installation after building new image with your 
# configuration using a Buildfile as follows:
#
# FROM ghcr.io/fluent/fluent-bit:4.1.0
# COPY ./fluent-bit.yaml /fluent-bit/etc/fluent-bit.yaml
# CMD [ "fluent-bit", "-c", "/fluent-bit/etc/fluent-bit.yaml" ]
#
$ podman build -t fb -f Buildfile

$ podman run --rm fb 

...
{"date":"2025-12-05 16:20:01.234567","message":"User action","user_id":"12345","action":"login","timestamp":"2025-12-05T16:20:00Z"}
...

The sensitive fields (emailpassword_hashsession_token) have been completely stripped out, leaving only the allowlisted fields. This approach is particularly useful when you're dealing with logs that might contain sensitive information and you want to take a cautious approach by explicitly defining what's safe to send to your logging backend.

Another powerful feature of the Record Modifier filter is the ability to generate UUIDs for each record. This is invaluable for tracking and correlating individual log entries across your distributed system:

service:
  flush: 1
  log_level: info
  http_server: on
  http_listen: 0.0.0.0
  http_port: 2020
  hot_reload: on

pipeline:
  inputs:
    - name: dummy
      tag: app.logs
      dummy: '{"message":"Processing request","service":"api"}'

  filters:
    - name: record_modifier
      match: '*'
      uuid_key: event_id

  outputs:
    - name: stdout
      match: '*'
      format: json_lines

When you run this configuration, each record will have a unique event_id field added automatically, making it easy to reference specific log entries in your observability tools.

This covers the top three filters for developers getting started with Fluent Bit while trying to transform and filter their telemetry data effectively and speed up their inner development loop.

More in the series

In this article you learned about three powerful Fluent Bit filters that improve the inner developer loop experience. This article is based on this online free workshop.

There will be more in this series as you continue to learn how to configure, run, manage, and master the use of Fluent Bit in the wild. Next up, exploring Fluent Bit routing as there are new ways to for developers to leverage this feature.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.