Each article in this series addresses a single topic by providing insights into what the topic is, why we are interested in exploring that topic, where to get started with the topic, and how to get hands-on with learning about the topic as it relates to the Fluent Bit project.
The idea is that each article can stand on its own, but that they also lead down a path that slowly increases our abilities to implement solutions with Fluent Bit telemetry pipelines.
Let's take a look at the topic of this article, using Fluent Bit multiline parsers for developers. In case you missed the previous article, check out using telemetry pipeline processors where you explore the top three telemetry data processors for developers.
This article will be a dive into parsers that help developer testing of Fluent Bit pipelines when dealing with difficult and long multiline log messages. We'll take a look at using multiline parsers for your telemetry pipeline configuration in Fluent Bit.
All examples in this article have been done on OSX and are assuming the reader is able to convert the actions shown here to their own local machines.
Where to get started
You should have explored the previous articles in this series to install and get started with Fluent Bit on your developer local machine, either using the source code or container images. Links at the end of this article will point you to a free hands-on workshop that lets you explore more of Fluent Bit in detail.
You can verify that you have a functioning installation by testing your Fluent Bit, either using a source installation or a container installation as shown below:
# For source installation.$ fluent-bit -i dummy -o stdout
# For container installation.$ podman run -ti ghcr.io/fluent/fluent-bit:4.0.8 -i dummy -o stdout
...
[0] dummy.0: [[1753105021.031338000, {}], {"message"=>"dummy"}]
[0] dummy.0: [[1753105022.033205000, {}], {"message"=>"dummy"}]
[0] dummy.0: [[1753105023.032600000, {}], {"message"=>"dummy"}]
[0] dummy.0: [[1753105024.033517000, {}], {"message"=>"dummy"}]...Let's look at the three tips for multiline parsers and how they help you manage complex log entries during your local development testing.
Multiline parsing in a telemetry pipeline
See this article for details about the service section of the configurations used in the rest of this article, but for now we plan to focus on our Fluent Bit pipeline and specifically the multiline parsers that can be of great help in managing our telemetry data during testing in our inner developer loop.
Below in the figure you see the phases of a telemetry pipeline. The second phase is parser, which is where unstructured input data is turned into structured data.
Note that in this article we explore Fluent Bit using multiline parsers that we can configure to process data in the input of our telemetry pipeline, but this is shown here as a separate phase.
The challenge developers often face is that real-world applications don't always log messages on a single line. Stack traces, error messages, and debug output frequently span multiple lines. These multiline messages need to be concatenated before they can be properly parsed and processed.
Fluent Bit provides multiline parsers to solve this exact problem. A multiline parser can recognize when multiple lines of log data belong together and concatenate them into a single event before further processing.
An example of multiline log data that developers encounter daily would be a Java stack trace:
Dec 14 06:41:08 Exception in thread "main" java.lang.RuntimeException: Something has gone wrong, aborting!
at com.myproject.module.MyProject.badMethod(MyProject.java:22)
at com.myproject.module.MyProject.oneMoreMethod(MyProject.java:18)
at com.myproject.module.MyProject.anotherMethod(MyProject.java:14)
at com.myproject.module.MyProject.someMethod(MyProject.java:10)
at com.myproject.module.MyProject.main(MyProject.java:6)Without multiline parsing, each line would be treated as a separate log entry. With multiline parsing, all these lines are correctly concatenated into a single structured event that maintains the complete context of the error.
The Fluent Bit multiline parser engine exposes two ways to configure the feature:
- Built-in multiline parsers
- Configurable multiline parsers
Fluent Bit provides pre-configured built-in parsers for common use cases such as:
- docker - Process log entries generated by Docker container engine
- cri - Process log entries generated by CRI-O container engine
- go - Process log entries from Go applications
- python - Process log entries from Python applications
- ruby - Process log entries from Ruby applications
- java - Process log entries from Java applications
For cases where the built-in parsers don't fit your needs, you can define custom multiline parsers. These custom parsers use regular expressions and state machines to identify the start and continuation of multiline messages. Let's look at how to configure a custom multiline parser that developers will want to know more about.
Now let's look at the most interesting tips for multiline parsers that developers will want to know more about.
1. Configurable multiline parser
One of the more common use cases for telemetry pipelines that developers will encounter is dealing with stack traces and error messages that span multiple lines. These multiline messages need special handling to ensure they are concatenated properly before being sent to their destination.
To provide an example we start by creating a test log file called test.log with multiline Java stack trace data:
single line...
Dec 14 06:41:08 Exception in thread "main" java.lang.RuntimeException: Something has gone wrong, aborting!
at com.myproject.module.MyProject.badMethod(MyProject.java:22)
at com.myproject.module.MyProject.oneMoreMethod(MyProject.java:18)
at com.myproject.module.MyProject.anotherMethod(MyProject.java:14)
at com.myproject.module.MyProject.someMethod(MyProject.java:10)
at com.myproject.module.MyProject.main(MyProject.java:6)
another line...Next, let's create a multiline parser configuration. We create a new file called parsers_multiline.yaml in our favorite editor and add the following configuration:
parsers: - name: multiline-regex-test
type: regex
flush_timeout: 1000
rules:
- state: start_state
regex: '/([a-zA-Z]+ \d+ \d+\:\d+\:\d+)(.*)/'
next_state: cont
- state: cont
regex: '/^\s+at.*/'
next_state: contLet's break down what this multiline parser does:
- name - We give our parser a unique name multiline-regex-test
- type - We specify the type as regex for regular expression based parsing
- flush_timeout - After 1000ms of no new matching lines, the buffer is flushed
- rules - We define the state machine rules that control multiline detection
The rules section is where the magic happens. A multiline parser uses states to determine which lines belong together:
- The start_state rule matches lines that begin a new multiline message. In our case, the pattern matches a timestamp followed by any text, which identifies the first line of our Java exception.
- The cont (continuation) rule matches lines that are part of the multiline message. Our pattern matches lines starting with whitespace followed by "at", which identifies the stack trace lines.
- Each rule specifies a next_state which tells Fluent Bit what state to transition to after matching. This creates a state machine that can handle complex multiline patterns.
When the parser sees a line matching start_state, it begins a new multiline buffer. Any subsequent lines matching the cont pattern are appended to that buffer. When a line doesn't match either pattern, or when the flush timeout expires, the complete multiline message is emitted as a single event.
Now let's create our main Fluent Bit configuration file fluent-bit.yaml that uses this multiline parser:
service: flush: 1
log_level: info
http_server: on
http_listen: 0.0.0.0
http_port: 2020
hot_reload: on
parsers_file: parsers_multiline.yaml
pipeline:
inputs:
- name: tail
path: test.log
read_from_head: true
multiline.parser: multiline-regex-test
outputs:
- name: stdout
match: '*'Note several important configuration points here:
- We include the parsers_file in the service section to load our multiline parser definitions
- We use the tail input plugin to read from our test log file
- We set read_from_head: true to read the entire file from the beginning
- Most importantly, we specify multiline.parser: multiline-regex-test to apply our multiline parser
The multiline parser is applied at the input stage, which is the recommended approach. This ensures that lines are concatenated before any other processing occurs.
Let's run this configuration to see the multiline parser in action:
# For source installation.$ fluent-bit --config fluent-bit.yaml# For container installation after building new image with your# configuration using a Buildfile as follows:## FROM ghcr.io/fluent/fluent-bit:4.1.0# COPY ./fluent-bit.yaml /fluent-bit/etc/fluent-bit.yaml# CMD [ "fluent-bit", "-c", "/fluent-bit/etc/fluent-bit.yaml" ]#$ podman build -t fb -f Buildfile$ podman run --rm fb...[0] tail.0: [[1750332967.679671000, {}], {"log"=>"single line... "}] [1] tail.0: [[1750332967.679677000, {}], {"log"=>"Dec 14 06:41:08 Exception in thread "main" java.lang.RuntimeException: Something has gone wrong, aborting! at com.myproject.module.MyProject.badMethod(MyProject.java:22) at com.myproject.module.MyProject.oneMoreMethod(MyProject.java:18) at com.myproject.module.MyProject.anotherMethod(MyProject.java:14) at com.myproject.module.MyProject.someMethod(MyProject.java:10) at com.myproject.module.MyProject.main(MyProject.java:6) "}] [2] tail.0: [[1750332967.679677000, {}], {"log"=>"another line......
Notice how the output shows three distinct events:
- The single line message passes through unchanged
- The entire stack trace is concatenated into one event, preserving the complete error context
- The final single line message passes through unchanged
This is exactly what we want. The multiline parser successfully identified the start of the Java exception and concatenated all the stack trace lines into a single structured event.
2. Extracting structured data from multiline messages
Once you have your multiline messages properly concatenated, you'll often want to extract specific fields from them. Fluent Bit supports this through the parser filter, which can be applied after multiline parsing.
Let's extend our example to extract the date and message components from the concatenated stack trace. First, we'll add a regular expression parser to our parsers_multiline.yaml file:
parsers: - name: multiline-regex-test
type: regex
flush_timeout: 1000
rules:
- state: start_state
regex: '/([a-zA-Z]+ \d+ \d+\:\d+\:\d+)(.*)/'
next_state: cont
- state: cont
regex: '/^\s+at.*/'
next_state: cont - name: named-capture-test
format: regex
regex: '/^(?<date>[a-zA-Z]+ \d+ \d+\:\d+\:\d+)\s+(?<message>(.|\n)*)$/m'The new named-capture-test parser uses named capture groups to extract:
- date - The timestamp at the start of the message
- message - The remaining content including all newlines
Note the /m modifier at the end of the regex, which enables multiline mode where . (dot) can match newline characters.
Now we update our main configuration to apply this parser using the parser filter:
service: flush: 1
log_level: info
http_server: on
http_listen: 0.0.0.0
http_port: 2020
hot_reload: on
parsers_file: parsers_multiline.yaml
pipeline:
inputs:
- name: tail
path: test.log
read_from_head: true
multiline.parser: multiline-regex-test
filters:
- name: parser
match: '*'
key_name: log
parser: named-capture-test
outputs:
- name: stdout
match: '*'We've added a parser filter that:
- Matches all events with match: '*'
- Looks at the log field with key_name: log
- Applies the named-capture-test parser to extract structured fields
Running this enhanced configuration produces:
# For source installation.$ fluent-bit --config fluent-bit.yaml# For container installation after building new image with your# configuration using a Buildfile as follows:## FROM ghcr.io/fluent/fluent-bit:4.1.0# COPY ./fluent-bit.yaml /fluent-bit/etc/fluent-bit.yaml# CMD [ "fluent-bit", "-c", "/fluent-bit/etc/fluent-bit.yaml" ]#$ podman build -t fb -f Buildfile$ podman run --rm fb...[0] tail.0: [[1750333602.460984000, {}], {"log"=>"single line... "}] [1] tail.0: [[1750333602.460998000, {}], {"date"=>"Dec 14 06:41:08", "message"=>"Exception in thread "main" java.lang.RuntimeException: Something has gone wrong, aborting! at com.myproject.module.MyProject.badMethod(MyProject.java:22) at com.myproject.module.MyProject.oneMoreMethod(MyProject.java:18) at com.myproject.module.MyProject.anotherMethod(MyProject.java:14) at com.myproject.module.MyProject.someMethod(MyProject.java:10) at com.myproject.module.MyProject.main(MyProject.java:6) "}] [2] tail.0: [[1750333602.460998000, {}], {"log"=>"another line... "}]...
Now the multiline Java exception event contains structured fields:
- date contains the timestamp
- message contains the complete exception and stack trace
This structured format makes it much easier to query, analyze, and alert on these error events in your observability backend.
3. Important considerations for multiline parsers
When working with multiline parsers, keep these important points in mind:
Apply multiline parsing at the input stage - While you can apply multiline parsing using the multiline filter, the recommended approach is to configure it directly on the input plugin using multiline.parser. This ensures lines are concatenated before any other processing.
Understand flush timeout behavior - The flush_timeout parameter determines how long Fluent Bit waits for additional matching lines before emitting the multiline buffer. Set this value based on your application's logging patterns. Too short and you might break up valid multiline messages. Too long and you'll introduce unnecessary latency.
Use specific state patterns - Make your regular expressions as specific as possible to avoid false matches. The start_state pattern should uniquely identify the beginning of a multiline message, and continuation patterns should only match valid continuation lines.
Be aware of resource implications - Multiline parsers buffer lines in memory until the complete message is ready. For applications with very large multiline messages (like huge stack traces), this can consume significant memory. The multiline parser bypasses the buffer_max_size limit to ensure complete messages are captured.
Test with real data - Always test your multiline parser configurations with actual log data from your applications. Edge cases in log formatting can cause unexpected parsing behavior.
This covers the three tips for developers getting started with Fluent Bit multiline parsers while trying to handle complex multiline log messages and speed up their inner development loop.
More in the series
In this article you learned how to use Fluent Bit multiline parsers to properly handle log messages that span multiple lines. This article is based on this online free workshop.
There will be more in this series as you continue to learn how to configure, run, manage, and master the use of Fluent Bit in the wild. Next up, exploring some of the more interesting Fluent Bit filters for developers.

.png)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.