Eric D. Schabell: Mastering Fluent Bit: 3 Tips for Telemetry Pipeline Multiline Parsers for Developers

This series is a general purpose getting started guide for those of us wanting to learn about the Cloud Native Computing Foundation (CNCF) project Fluent Bit.

Each article in this series addresses a single topic by providing insights into what the topic is, why we are interested in exploring that topic, where to get started with the topic, and how to get hands-on with learning about the topic as it relates to the Fluent Bit project.

The idea is that each article can stand on its own, but that they also lead down a path that slowly increases our abilities to implement solutions with Fluent Bit telemetry pipelines.

Let's take a look at the topic of this article, using Fluent Bit multiline parsers for developers. In case you missed the previous article, check out using telemetry pipeline processors where you explore the top three telemetry data processors for developers.

This article will be a dive into parsers that help developer testing of Fluent Bit pipelines when dealing with difficult and long multiline log messages. We'll take a look at using multiline parsers for your telemetry pipeline configuration in Fluent Bit.

All examples in this article have been done on OSX and are assuming the reader is able to convert the actions shown here to their own local machines.

Where to get started

You should have explored the previous articles in this series to install and get started with Fluent Bit on your developer local machine, either using the source code or container images. Links at the end of this article will point you to a free hands-on workshop that lets you explore more of Fluent Bit in detail.

You can verify that you have a functioning installation by testing your Fluent Bit, either using a source installation or a container installation as shown below:

# For source installation.

$ fluent-bit -i dummy -o stdout

# For container installation.

$ podman run -ti ghcr.io/fluent/fluent-bit:4.0.8 -i dummy -o stdout

...

[0] dummy.0: [[1753105021.031338000, {}], {"message"=>"dummy"}]
[0] dummy.0: [[1753105022.033205000, {}], {"message"=>"dummy"}]
[0] dummy.0: [[1753105023.032600000, {}], {"message"=>"dummy"}]
[0] dummy.0: [[1753105024.033517000, {}], {"message"=>"dummy"}]

...

Let's look at the three tips for multiline parsers and how they help you manage complex log entries during your local development testing.

Multiline parsing in a telemetry pipeline

See this article for details about the service section of the configurations used in the rest of this article, but for now we plan to focus on our Fluent Bit pipeline and specifically the multiline parsers that can be of great help in managing our telemetry data during testing in our inner developer loop.

Below in the figure you see the phases of a telemetry pipeline. The second phase is parser, which is where unstructured input data is turned into structured data.

Note that in this article we explore Fluent Bit using multiline parsers that we can configure to process data in the input of our telemetry pipeline, but this is shown here as a separate phase.

The challenge developers often face is that real-world applications don't always log messages on a single line. Stack traces, error messages, and debug output frequently span multiple lines. These multiline messages need to be concatenated before they can be properly parsed and processed.

Fluent Bit provides multiline parsers to solve this exact problem. A multiline parser can recognize when multiple lines of log data belong together and concatenate them into a single event before further processing.

An example of multiline log data that developers encounter daily would be a Java stack trace:

Dec 14 06:41:08 Exception in thread "main" java.lang.RuntimeException: Something has gone wrong, aborting!

  at com.myproject.module.MyProject.badMethod(MyProject.java:22)
  at com.myproject.module.MyProject.oneMoreMethod(MyProject.java:18)
  at com.myproject.module.MyProject.anotherMethod(MyProject.java:14)
  at com.myproject.module.MyProject.someMethod(MyProject.java:10)
  at com.myproject.module.MyProject.main(MyProject.java:6)

Without multiline parsing, each line would be treated as a separate log entry. With multiline parsing, all these lines are correctly concatenated into a single structured event that maintains the complete context of the error.

The Fluent Bit multiline parser engine exposes two ways to configure the feature:

Built-in multiline parsers
Configurable multiline parsers

Fluent Bit provides pre-configured built-in parsers for common use cases such as:

docker - Process log entries generated by Docker container engine
cri - Process log entries generated by CRI-O container engine
go - Process log entries from Go applications
python - Process log entries from Python applications
ruby - Process log entries from Ruby applications
java - Process log entries from Java applications

For cases where the built-in parsers don't fit your needs, you can define custom multiline parsers. These custom parsers use regular expressions and state machines to identify the start and continuation of multiline messages. Let's look at how to configure a custom multiline parser that developers will want to know more about.

Now let's look at the most interesting tips for multiline parsers that developers will want to know more about.

1. Configurable multiline parser

One of the more common use cases for telemetry pipelines that developers will encounter is dealing with stack traces and error messages that span multiple lines. These multiline messages need special handling to ensure they are concatenated properly before being sent to their destination.

To provide an example we start by creating a test log file called test.log with multiline Java stack trace data:

single line...
Dec 14 06:41:08 Exception in thread "main" java.lang.RuntimeException: Something has gone wrong, aborting!
    at com.myproject.module.MyProject.badMethod(MyProject.java:22)
    at com.myproject.module.MyProject.oneMoreMethod(MyProject.java:18)
    at com.myproject.module.MyProject.anotherMethod(MyProject.java:14)
    at com.myproject.module.MyProject.someMethod(MyProject.java:10)
    at com.myproject.module.MyProject.main(MyProject.java:6)
another line...

Next, let's create a multiline parser configuration. We create a new file called parsers_multiline.yaml in our favorite editor and add the following configuration:

parsers:

  - name: multiline-regex-test
    type: regex
    flush_timeout: 1000
    rules:
      - state: start_state
        regex: '/([a-zA-Z]+ \d+ \d+\:\d+\:\d+)(.*)/'
        next_state: cont
      - state: cont
        regex: '/^\s+at.*/'
        next_state: cont

Let's break down what this multiline parser does:

name - We give our parser a unique name multiline-regex-test
type - We specify the type as regex for regular expression based parsing
flush_timeout - After 1000ms of no new matching lines, the buffer is flushed
rules - We define the state machine rules that control multiline detection

The rules section is where the magic happens. A multiline parser uses states to determine which lines belong together:

The start_state rule matches lines that begin a new multiline message. In our case, the pattern matches a timestamp followed by any text, which identifies the first line of our Java exception.
The cont (continuation) rule matches lines that are part of the multiline message. Our pattern matches lines starting with whitespace followed by "at", which identifies the stack trace lines.
Each rule specifies a next_state which tells Fluent Bit what state to transition to after matching. This creates a state machine that can handle complex multiline patterns.

When the parser sees a line matching start_state, it begins a new multiline buffer. Any subsequent lines matching the cont pattern are appended to that buffer. When a line doesn't match either pattern, or when the flush timeout expires, the complete multiline message is emitted as a single event.

Now let's create our main Fluent Bit configuration file fluent-bit.yaml that uses this multiline parser:

service:

  flush: 1
  log_level: info
  http_server: on
  http_listen: 0.0.0.0
  http_port: 2020
  hot_reload: on
  parsers_file: parsers_multiline.yaml

pipeline:
  inputs:
    - name: tail
      path: test.log
      read_from_head: true
      multiline.parser: multiline-regex-test

  outputs:
    - name: stdout
      match: '*'

Note several important configuration points here:

We include the parsers_file in the service section to load our multiline parser definitions
We use the tail input plugin to read from our test log file
We set read_from_head: true to read the entire file from the beginning
Most importantly, we specify multiline.parser: multiline-regex-test to apply our multiline parser

The multiline parser is applied at the input stage, which is the recommended approach. This ensures that lines are concatenated before any other processing occurs.

Let's run this configuration to see the multiline parser in action:

# For source installation.
$ fluent-bit --config fluent-bit.yaml

# For container installation after building new image with your 
# configuration using a Buildfile as follows:
#
# FROM ghcr.io/fluent/fluent-bit:4.1.0
# COPY ./fluent-bit.yaml /fluent-bit/etc/fluent-bit.yaml
# CMD [ "fluent-bit", "-c", "/fluent-bit/etc/fluent-bit.yaml" ]
#
$ podman build -t fb -f Buildfile

$ podman run --rm fb 

...
[0] tail.0: [[1750332967.679671000, {}], {"log"=>"single line...
"}]
[1] tail.0: [[1750332967.679677000, {}], {"log"=>"Dec 14 06:41:08 Exception in thread "main" java.lang.RuntimeException: Something has gone wrong, aborting!
    at com.myproject.module.MyProject.badMethod(MyProject.java:22)
    at com.myproject.module.MyProject.oneMoreMethod(MyProject.java:18)
    at com.myproject.module.MyProject.anotherMethod(MyProject.java:14)
    at com.myproject.module.MyProject.someMethod(MyProject.java:10)
    at com.myproject.module.MyProject.main(MyProject.java:6)
"}]
[2] tail.0: [[1750332967.679677000, {}], {"log"=>"another line...
...

Notice how the output shows three distinct events:

The single line message passes through unchanged
The entire stack trace is concatenated into one event, preserving the complete error context
The final single line message passes through unchanged

This is exactly what we want. The multiline parser successfully identified the start of the Java exception and concatenated all the stack trace lines into a single structured event.

2. Extracting structured data from multiline messages

Once you have your multiline messages properly concatenated, you'll often want to extract specific fields from them. Fluent Bit supports this through the parser filter, which can be applied after multiline parsing.

Let's extend our example to extract the date and message components from the concatenated stack trace. First, we'll add a regular expression parser to our parsers_multiline.yaml file:

parsers:

  - name: multiline-regex-test
    type: regex
    flush_timeout: 1000
    rules:
      - state: start_state
        regex: '/([a-zA-Z]+ \d+ \d+\:\d+\:\d+)(.*)/'
        next_state: cont
      - state: cont
        regex: '/^\s+at.*/'
        next_state: cont

  - name: named-capture-test
    format: regex
    regex: '/^(?<date>[a-zA-Z]+ \d+ \d+\:\d+\:\d+)\s+(?<message>(.|\n)*)$/m'

The new named-capture-test parser uses named capture groups to extract:

date - The timestamp at the start of the message
message - The remaining content including all newlines

Note the /m modifier at the end of the regex, which enables multiline mode where . (dot) can match newline characters.

Now we update our main configuration to apply this parser using the parser filter:

service:

  flush: 1
  log_level: info
  http_server: on
  http_listen: 0.0.0.0
  http_port: 2020
  hot_reload: on
  parsers_file: parsers_multiline.yaml

pipeline:
  inputs:
    - name: tail
      path: test.log
      read_from_head: true
      multiline.parser: multiline-regex-test

  filters:
    - name: parser
      match: '*'
      key_name: log
      parser: named-capture-test


  outputs:
    - name: stdout
      match: '*'

We've added a parser filter that:

Matches all events with match: '*'
Looks at the log field with key_name: log
Applies the named-capture-test parser to extract structured fields

Running this enhanced configuration produces:

# For source installation.
$ fluent-bit --config fluent-bit.yaml

# For container installation after building new image with your 
# configuration using a Buildfile as follows:
#
# FROM ghcr.io/fluent/fluent-bit:4.1.0
# COPY ./fluent-bit.yaml /fluent-bit/etc/fluent-bit.yaml
# CMD [ "fluent-bit", "-c", "/fluent-bit/etc/fluent-bit.yaml" ]
#
$ podman build -t fb -f Buildfile

$ podman run --rm fb 

...
[0] tail.0: [[1750333602.460984000, {}], {"log"=>"single line...
"}]
[1] tail.0: [[1750333602.460998000, {}], {"date"=>"Dec 14 06:41:08", "message"=>"Exception in thread "main" java.lang.RuntimeException: Something has gone wrong, aborting!
    at com.myproject.module.MyProject.badMethod(MyProject.java:22)
    at com.myproject.module.MyProject.oneMoreMethod(MyProject.java:18)
    at com.myproject.module.MyProject.anotherMethod(MyProject.java:14)
    at com.myproject.module.MyProject.someMethod(MyProject.java:10)
    at com.myproject.module.MyProject.main(MyProject.java:6)
"}]
[2] tail.0: [[1750333602.460998000, {}], {"log"=>"another line...
"}]
...

Now the multiline Java exception event contains structured fields:

date contains the timestamp
message contains the complete exception and stack trace

This structured format makes it much easier to query, analyze, and alert on these error events in your observability backend.

3. Important considerations for multiline parsers

When working with multiline parsers, keep these important points in mind:

Apply multiline parsing at the input stage - While you can apply multiline parsing using the multiline filter, the recommended approach is to configure it directly on the input plugin using multiline.parser. This ensures lines are concatenated before any other processing.

Understand flush timeout behavior - The flush_timeout parameter determines how long Fluent Bit waits for additional matching lines before emitting the multiline buffer. Set this value based on your application's logging patterns. Too short and you might break up valid multiline messages. Too long and you'll introduce unnecessary latency.

Use specific state patterns - Make your regular expressions as specific as possible to avoid false matches. The start_state pattern should uniquely identify the beginning of a multiline message, and continuation patterns should only match valid continuation lines.

Be aware of resource implications - Multiline parsers buffer lines in memory until the complete message is ready. For applications with very large multiline messages (like huge stack traces), this can consume significant memory. The multiline parser bypasses the buffer_max_size limit to ensure complete messages are captured.

Test with real data - Always test your multiline parser configurations with actual log data from your applications. Edge cases in log formatting can cause unexpected parsing behavior.

This covers the three tips for developers getting started with Fluent Bit multiline parsers while trying to handle complex multiline log messages and speed up their inner development loop.

More in the series

In this article you learned how to use Fluent Bit multiline parsers to properly handle log messages that span multiple lines. This article is based on this online free workshop.

There will be more in this series as you continue to learn how to configure, run, manage, and master the use of Fluent Bit in the wild. Next up, exploring some of the more interesting Fluent Bit filters for developers.