Eric D. Schabell: Mastering Fluent Bit: Top Tip Using Telemetry Pipeline Parsers for Developers

This series is a general purpose getting started guide for those of us wanting to learn about the Cloud Native Computing Foundation (CNCF) project Fluent Bit.

Each article in this series addresses a single topic by providing insights into what the topic is, why we are interested in exploring that topic, where to get started with the topic, and how to get hands-on with learning about the topic as it relates to the Fluent Bit project.

The idea is that each article can stand on its own, but that they also lead down a path that slowly increases our abilities to implement solutions with Fluent Bit telemetry pipelines.

Let's take a look at the topic of this article, using Fluent Bit tips for developers. In case you missed the previous article, check out the top 3 telemetry pipeline output plugins for developers where you get tips on the best of Fluent Bit for your developer experiences.

This article will be a hands-on tour of the things that help you as a developer testing out your Fluent Bit pipelines. We'll take a look at the top tip when using a parser for your telemetry pipeline configuration in Fluent Bit.

All examples in this article have been done on OSX and are assuming the reader is able to convert the actions shown here to their own local machines.

Where to get Started

You should have explored the previous articles in this series to install and get started with Fluent Bit on your developer local machine, either using the source code or container images. Links at the end of this article will point you to a free hands-on workshop that lets you explore more of Fluent Bit in detail.

You can verify that you have a functioning installation by testing your Fluent Bit, either using a source installation or a container installation as shown below:

# For source installation.

$ fluent-bit -i dummy -o stdout

# For container installation.

$ podman run -ti ghcr.io/fluent/fluent-bit:4.0.8 -i dummy -o stdout

...

[0] dummy.0: [[1753105021.031338000, {}], {"message"=>"dummy"}]
[0] dummy.0: [[1753105022.033205000, {}], {"message"=>"dummy"}]
[0] dummy.0: [[1753105023.032600000, {}], {"message"=>"dummy"}]
[0] dummy.0: [[1753105024.033517000, {}], {"message"=>"dummy"}]

...

Let's look at a few tips and tricks to help you with your local development testing of Fluent Bit input plugins.

Parsing in a telemetry pipeline

See this article for details about the service section of the configurations used in the rest of this article, but for now we plan to focus on our Fluent Bit pipeline and specifically the parsers that can be of great help in managing our telemetry data during testing in our inner developer loop.

Below in the figure you see the phases of a telemetry pipeline. The second phase is parser, which is where unstructured input data is turned into structured data. Fluent Bit does this using Parsers that we can configure to manipulate the unstructured data producing structured data for the next phases of our pipeline..

An example of this can be found in the online workshop, where we see an example of unstructured log data:

192.168.2.20 - - [28/Jul/2006:10:27:10 -0300] "GET /cgi-bin/try/ HTTP/1.0" 200 3395

When unstructured log data is parsed by Fluent Bit, the results become structured data such as the following:

  "host":    "192.168.2.20",
  "user":    "-",
  "method":  "GET",
  "path":    "/cgi-bin/try/",
  "code":    "200",
  "size":    "3395"
}

The Fluent Bit parser engine is configurable and can process log entries based in two formats:

JSON maps
Regular expressions

By default, Fluent Bit provides a set of pre-configured parsers that can be used for different use cases, such as logs from these formats:

Apache
NGINX
Docker
Syslog rfc5424
Syslog rfc3164

Parsers tend to be defined in configuration files that are loaded at start time in the main Fluent Bit configuration file. We can also load parsers from the command line, but we won't be covering this here. Keeping all of this in mind, let's look at the most interesting parser that developers will want to know more about.

1. Regular expression parser

One of the more common use cases for telemetry pipelines that developers will encounter is having multiple event streams producing data that creates the situation that keys are not unique if parsed without some cleanup. To illustrate how Fluent Bit can easily provide us with a means to both parse and filter events from multiple input sources to clean up any duplicate keys before sending onwards to a destination.

To provide an example we start with a simple Fluent Bit configuration file fluent-bit.yaml containing a configuration using the dummy plugin to generate two types of events, both using the same key to cause confusion if we try querying without cleaning them up first:

service:

  flush: 1
  log_level: info

  http_server: on
  http_listen: 0.0.0.0
  http_port: 2020
  hot_reload: on

pipeline:
  inputs:
    # This entry generates a successful message.
    - name:  dummy
      tag:   event.success
      dummy: '{"message":"true 200 success"}'

    # This entry generates a failure message.
    - name:  dummy
      tag:   event.error
      dummy: '{"message":"false 500 error"}'

  outputs:
    - name: stdout
      match: '*'

      format: json_lines

      json_date_format: java_sql_timestamp

Our configuration is tagging each successful event with event.success and failure events with event.error. The confusion will be caused by configuring the dummy message with the same key, message, for both event definitions. This will cause our incoming events to be confusing to deal with.

Let's run this to confirm out working test environment:

# For source installation.
$ fluent-bit --config fluent-bit.yaml

# For container installation after building new image with your 
# configuration using a Buildfile as follows:
#
# FROM ghcr.io/fluent/fluent-bit:4.1.0
# COPY ./fluent-bit.yaml /fluent-bit/etc/fluent-bit.yaml
# CMD [ "fluent-bit", "-c", "/fluent-bit/etc/fluent-bit.yaml" ]
#
$ podman build -t fb -f Buildfile

$ podman run --rm fb 

...
{"date":"2025-10-26 19:59:34.508732","message":"true 200 success"}
{"date":"2025-10-26 19:59:34.508837","message":"false 500 error"}
{"date":"2025-10-26 19:59:35.509396","message":"true 200 success"}
{"date":"2025-10-26 19:59:35.509456","message":"false 500 error"}
{"date":"2025-10-26 19:59:36.508828","message":"true 200 success"}
...

Now we have dirty ingested data coming into our pipeline, showing that we have multiple messages on the same key. To be able to clean this up for usage before passing on to the backend (output), we need to make use of both the Parser and Filter phases.

First, in the Parser phase, where unstructured data is converted into structured data, we'll make use of the built in Regular expression parser plugin to structure the duplicate messages into something more usable. To set up the parser configuration we create a new file called parsers.yaml in our favorite editor. Add the following configuration, where we are defining a parser, naming the parser message_cleaning_parser, selecting the built-in regex parser, and applying the regular expression shown here to convert each message into a structured format (note this actually is applied to incoming messages in the next phase of the telemetry pipeline):

# This parser uses the built-in parser plugin and applies the
# regex to all incoming events.
#
parsers:
  - name: message_cleaning_parser
    format: regex
    regex: '^(?<valid_message>[^ ]+) (?<code>[^ ]+) (?<type>[^ ]+)$'

In the Filter phase is where the previously defined parser is put to the test. To set up the filters configuration we create a new section as shown below and add the following configuration, where we are defining filters, naming a new filter parser, matching all incoming messages to apply this filter, looking for the key message to select the value to be fed into the parser, and applying the parser message_cleaning_parser to it:

service:

  flush: 1
  log_level: info

  http_server: on
  http_listen: 0.0.0.0
  http_port: 2020
  hot_reload: on

  parsers_file: parsers.yaml

pipeline:
  inputs:
    # This entry generates a successful message.
    - name:  dummy
      tag:   event.success
      dummy: '{"message":"true 200 success"}'

    # This entry generates a failure message.
    - name:  dummy
      tag:   event.error
      dummy: '{"message":"false 500 error"}'

  filters:
    - name: parser
      match: '*'
      key_name: message
      parser: message_cleaning_parser

  outputs:
    - name: stdout
      match: '*'

      format: json_lines

      json_date_format: java_sql_timestamp

Also note that we have to include the parsers_file by name to ensure our filters can find the parser we defined. Now when we run the configuration we see the following:

# For source installation.
$ fluent-bit --config fluent-bit.yaml

# For container installation after building new image with your 
# configuration using a Buildfile as follows:
#
# FROM ghcr.io/fluent/fluent-bit:4.1.0
# COPY ./fluent-bit.yaml /fluent-bit/etc/fluent-bit.yaml
# COPY ./parsers.yaml /fluent-bit/etc/parsers.yaml
# CMD [ "fluent-bit", "-c", "/fluent-bit/etc/fluent-bit.yaml" ]
#
$ podman build -t fb -f Buildfile

$ podman run --rm fb 

...
{"date":"2025-10-26 20:15:54.233766","valid_message":"true","code":"200","type":"success"}
{"date":"2025-10-26 20:15:54.234199","valid_message":"false","code":"500","type":"error"}
{"date":"2025-10-26 20:15:55.234238","valid_message":"true","code":"200","type":"success"}
{"date":"2025-10-26 20:15:55.234323","valid_message":"false","code":"500","type":"error"}
{"date":"2025-10-26 20:15:56.233915","valid_message":"true","code":"200","type":"success"}
{"date":"2025-10-26 20:15:56.234009","valid_message":"false","code":"500","type":"error"}
...

Note the alternating generated event lines with parsed messages that now contain keys for each field to simplify later querying. The message key has been parsed to show valid_message, solving the confusing use case.

This covers the top tip for developers getting started with Fluent Bit while trying to leverage a parser to clean up their telemetry data quickly and speed up their inner development loop.

More in the series

In this article you learned a few handy tricks for using Fluent Bit output plugins and routing to improve the inner developer loop experience. This article is based on this online free workshop.

There will be more in this series as you continue to learn how to configure, run, manage, and master the use of Fluent Bit in the wild. Next up, exploring some of the more interesting Fluent Bit processors for developers.