Eric D. Schabell: Mastering Fluent Bit: Controlling Logs with Fluent Bit on Kubernetes

Monday, June 2, 2025

Mastering Fluent Bit: Controlling Logs with Fluent Bit on Kubernetes

This series is a general purpose getting started guide for those of us wanting to learn about the Cloud Native Computing Foundation (CNCF) project Fluent Bit. 

Each article in this series addresses a single topic by providing insights into what the topic is, why we are interested in exploring that topic, where to get started with the topic, and how to get hands-on with learning about the topic as it relates to the Fluent Bit project.

The idea is that each article can stand on its own, but that they also lead down a path that slowly increases our abilities to implement solutions with Fluent Bit telemetry pipelines.

Let's take a look at the topic of this article, using Fluent Bit to get control of logs on a Kubernetes cluster.

In case you missed the previous article, I'm providing a short introduction to Fluent Bit before sharing how to use Fluent Bit telemetry pipeline on a Kubernetes cluster to take control of all the logs being generated.

What is Fluent Bit?

Before diving into Fluent Bit, let's step back and look at the position of this project within the Fluent organization. If we look at the Fluent organization on Github, we find Fluentd and Fluent Bit project hosted there. The back story is that the project started with log parsing project Fluentd joining the CNCF in 2026 and reaching Graduated status in 2019. 

Once it became apparent that the world was heading into cloud native Kubernetes environments, the solution was not designed for the flexible and light weight requirements that Kubernetes solutions demanded. Fluent bit was born from the need to have a low resource using, high throughput, and highly scalable log management solution for cloud native Kubernetes environments. The project was started within the Fluent organization as a sub-project in 2017 and the rest is now 10 years of history with the release of v4 last week!

Fluent Bit has become so much more than a flexible and lightweight log pipeline solution, now able to process metrics and traces and becoming a telemetry pipeline collection tool of choice for those looking to put the control over their telemetry data right at the source where it's being collected.

Let's get started with Fluent Bit and see what we can do for ourselves!

Why control logs on a Kubernetes cluster?

When you dive into the cloud native world this means you are deploying containers on Kubernetes. The complexities increase dramatically as your applications and microservices interact in this complex and dynamic infrastructure landscape. 

Deployments can auto-scale, pods spin up and are taken down as the need arises, and underlying all of this are the various Kubernetes controlling components. All of these things are generating telemetry data and Fluent Bit is a wonderfully simple way to take control of them across a Kubernetes cluster. It provides a way of collecting everything through a central telemetry pipeline as you go while providing the ability to parse, filter, and route all your telemetry data.

For developers, this article will demonstrate using Fluent Bit as a single point of log collection on a development Kubernetes cluster with a deployed workload. Finally, all examples in this article have been done on OSX and are assuming the reader is able to convert the actions shown here to their own local machines

Where to get started

To ensure you are ready to start controlling your Kubernetes cluster logs, the rest of this article assumes you have completed the previous article. This ensures you are running a two node Kubernetes cluster with a workload running in the form of Ghost CMS and Fluent Bit is installed to collect all container logs. 

If you did not work through the previous article, I've provided a Logs Control Easy Install project repository that you can download, unzip, and run with one command to spin up the Kubernetes cluster with the above setup on your local machine.

Using either path, once set up you are able to see the logs from Fluent Bit containing everything generated on this running cluster. This would be the logs across three namespaces; kube-system, ghost, and logging

You can verify that they are up and running by browsing those namespaces, shown here on my local machine:

$ kubectl --kubeconfig target/2nodeconfig.yaml get pods --namespace kube-system

NAME                                          READY   STATUS    RESTARTS   AGE
coredns-668d6bf9bc-jrvrx                      1/1     Running   0          69m
coredns-668d6bf9bc-wbqjk                      1/1     Running   0          69m
etcd-2node-control-plane                      1/1     Running   0          69m
kindnet-fmf8l                                 1/1     Running   0          69m
kindnet-rhlp6                                 1/1     Running   0          69m
kube-apiserver-2node-control-plane            1/1     Running   0          69m
kube-controller-manager-2node-control-plane   1/1     Running   0          69m
kube-proxy-b5vjr                              1/1     Running   0          69m
kube-proxy-jxpqc                              1/1     Running   0          69m
kube-scheduler-2node-control-plane            1/1     Running   0          69m

$ kubectl --kubeconfig target/2nodeconfig.yaml get pods --namespace ghost

NAME                        READY   STATUS    RESTARTS      AGE
ghost-dep-8d59966f4-87jsf   1/1     Running   0             77m
ghost-dep-mysql-0           1/1     Running   0             77m

$ kubectl --kubeconfig target/2nodeconfig.yaml get pods --namespace logging

NAME               READY   STATUS    RESTARTS   AGE
fluent-bit-7qjmx   1/1     Running   0          41m

The initial configuration for the Fluent Bit instance is to collect all container logs, from all namespaces, shown in in the fluent-bit-helm.yaml configuration file used in our set up, highlighted in bold below:

args:
  - --workdir=/fluent-bit/etc
  - --config=/fluent-bit/etc/conf/fluent-bit.yaml

config:
  extraFiles:
    fluent-bit.yaml: |
      service:
       flush: 1
       log_level: info
       http_server: true
       http_listen: 0.0.0.0
       http_port: 2020
      pipeline:
        inputs:
          - name: tail
            tag: kube.*
            read_from_head: true
            path: /var/log/containers/*.log
            multiline.parser: docker, cri
        outputs:
          - name: stdout
            match: '*'

To see all the logs collected, we can dump the Fluent Bit log file as follows, using the pod name we found above:

$ kubectl --kubeconfig target/2nodeconfig.yaml logs fluent-bit-7qjmx --nanmespace logging

[OUTPUT-CUT-DUE-TO-LOG-VOLUME]
...

You will notice if you browse that you have error messages, info messages, if you look hard enough some logs from Ghost mysql workload, the Ghost CMS workload, and even your Fluent Bit instance. As a developer working on your cluster, how can you find anything useful in this flood of logging? The good thing is you do have a single place to look for them!

Another point to mention is that by using the Fluent Bit tail input plugin and setting it to read from the beginning of each log file, we have ensured that our log telemetry data is taken from all our logs. If we didn't set this to collect from the beginning of the log file, our telemetry pipeline would miss everything that was generated before the Fluent Bit instance started. This ensures we have the workload startup messages and can test on standard log telemetry events each time we modify our pipeline configuration.

Let's start taking control of our logs and see how we as developers can make some use of the logs data we want to see during our local development testing.

Taking back control

The first thing we can do is to focus our log collection efforts on just the workload we are interested in and in this example we are looking to find problems with our Ghost CMS deployment. As you are not interested in the logs from anything happening in the kube-system namespace, you can narrow the focus of your Fluent Bit input plugin to only examine Ghost log files. 

This can be done by making a new configuration file called myfluent-bit-heml.yaml file and change the default path as follows in bold:

args:
  - --workdir=/fluent-bit/etc
  - --config=/fluent-bit/etc/conf/fluent-bit.yaml

config:
  extraFiles:
    fluent-bit.yaml: |
      service:
       flush: 1
       log_level: info
       http_server: true
       http_listen: 0.0.0.0
       http_port: 2020
      pipeline:
        inputs:
          - name: tail
            tag: kube.*
            read_from_head: true
            path: /var/log/containers/*ghost*
multiline.parser: docker, cri outputs: - name: stdout match: '*'

Next step is to update the Fluent Bit instance with a helm update command as follows:

$ helm upgrade --kubeconfig target/2nodeconfig.yaml --install fluent-bit fluent/fluent-bit --set image.tag=4.0.0 --namespace=logging --create-namespace --values=myfluent-bit-helm.yaml

NAME               READY   STATUS    RESTARTS   AGE
fluent-bit-mzktk   1/1     Running   0          28s

Now explore the logs being collected by Fluent Bit and notice that all the kube-system namespace logs are no longer there and we can focus on our deployed workload.

$ kubectl --kubeconfig target/2nodeconfig.yaml logs fluent-bit-mzktk --nanmespace logging

...
[11] kube.var.log.containers.ghost-dep-8d59966f4-87jsf_ghost_ghost-dep-c8ee31893743a1ce781f6f43ea3d0bfb93412623a721a2248e842936dc567089.log: [[1747583486.278137067, {}], {"time"=>"2025-05-18T15:51:26.278137067Z", "stream"=>"stderr", "_p"=>"F", "log"=>"ghost 15:51:26.27 INFO  ==> Configuring database"}]
[12] kube.var.log.containers.ghost-dep-8d59966f4-87jsf_ghost_ghost-dep-c8ee31893743a1ce781f6f43ea3d0bfb93412623a721a2248e842936dc567089.log: [[1747583486.318427288, {}], {"time"=>"2025-05-18T15:51:26.318427288Z", "stream"=>"stderr", "_p"=>"F", "log"=>"ghost 15:51:26.31 INFO  ==> Setting up Ghost"}]
[13] kube.var.log.containers.ghost-dep-8d59966f4-87jsf_ghost_ghost-dep-c8ee31893743a1ce781f6f43ea3d0bfb93412623a721a2248e842936dc567089.log: [[1747583491.211337893, {}], {"time"=>"2025-05-18T15:51:31.211337893Z", "stream"=>"stderr", "_p"=>"F", "log"=>"ghost 15:51:31.21 INFO  ==> Configuring Ghost URL to http://127.0.0.1:2368"}]
[14] kube.var.log.containers.ghost-dep-8d59966f4-87jsf_ghost_ghost-dep-c8ee31893743a1ce781f6f43ea3d0bfb93412623a721a2248e842936dc567089.log: [[1747583491.234609188, {}], {"time"=>"2025-05-18T15:51:31.234609188Z", "stream"=>"stderr", "_p"=>"F", "log"=>"ghost 15:51:31.23 INFO  ==> Passing admin user creation wizard"}]
[15] kube.var.log.containers.ghost-dep-8d59966f4-87jsf_ghost_ghost-dep-c8ee31893743a1ce781f6f43ea3d0bfb93412623a721a2248e842936dc567089.log: [[1747583491.243222300, {}], {"time"=>"2025-05-18T15:51:31.2432223Z", "stream"=>"stderr", "_p"=>"F", "log"=>"ghost 15:51:31.24 INFO  ==> Starting Ghost in background"}]
[16] kube.var.log.containers.ghost-dep-8d59966f4-87jsf_ghost_ghost-dep-c8ee31893743a1ce781f6f43ea3d0bfb93412623a721a2248e842936dc567089.log: [[1747583519.424206501, {}], {"time"=>"2025-05-18T15:51:59.424206501Z", "stream"=>"stderr", "_p"=>"F", "log"=>"ghost 15:51:59.42 INFO  ==> Stopping Ghost"}]
[17] kube.var.log.containers.ghost-dep-8d59966f4-87jsf_ghost_ghost-dep-c8ee31893743a1ce781f6f43ea3d0bfb93412623a721a2248e842936dc567089.log: [[1747583520.921096963, {}], {"time"=>"2025-05-18T15:52:00.921096963Z", "stream"=>"stderr", "_p"=>"F", "log"=>"ghost 15:52:00.92 INFO  ==> Persisting Ghost installation"}]
[18] kube.var.log.containers.ghost-dep-8d59966f4-87jsf_ghost_ghost-dep-c8ee31893743a1ce781f6f43ea3d0bfb93412623a721a2248e842936dc567089.log: [[1747583521.008567054, {}], {"time"=>"2025-05-18T15:52:01.008567054Z", "stream"=>"stderr", "_p"=>"F", "log"=>"ghost 15:52:01.00 INFO  ==> ** Ghost setup finished! **"}]
...

This is just a selection of log lines from the total output. If you look closer you see these logs have their own sort of format, so let's standardize them so that JSON is the output format and make the various timestamps a bit more readable by changing your Fluent Bit output plugin configuration as follows:

args:
  - --workdir=/fluent-bit/etc
  - --config=/fluent-bit/etc/conf/fluent-bit.yaml

config:
  extraFiles:
    fluent-bit.yaml: |
      service:
       flush: 1
       log_level: info
       http_server: true
       http_listen: 0.0.0.0
       http_port: 2020
      pipeline:
        inputs:
          - name: tail
            tag: kube.*
            read_from_head: true
            path: /var/log/containers/*ghost*
multiline.parser: docker, cri outputs: - name: stdout
            match: '*'
            format: json_lines
            json_date_format: java_sql_timestamp

Update the Fluent Bit instance using a helm update command as follows:

$ helm upgrade --kubeconfig target/2nodeconfig.yaml --install fluent-bit fluent/fluent-bit --set image.tag=4.0.0 --namespace=logging --create-namespace --values=myfluent-bit-helm.yaml

NAME               READY   STATUS    RESTARTS   AGE
fluent-bit-gqsc8   1/1     Running   0          42s

Now explore the logs being collected by Fluent Bit and notice the output changes:

$ kubectl --kubeconfig target/2nodeconfig.yaml logs fluent-bit-gqsc8 --nanmespace logging

...
{"date":"2025-06-05 13:49:58.001603","time":"2025-06-05T13:49:58.001603337Z","stream":"stderr","_p":"F","log":"\u001b[38;5;6mghost \u001b[38;5;5m13:49:58.00 \u001b[0m\u001b[38;5;2mINFO \u001b[0m ==> Stopping Ghost"}
{"date":"2025-06-05 13:49:59.291618","time":"2025-06-05T13:49:59.291618721Z","stream":"stderr","_p":"F","log":"\u001b[38;5;6mghost \u001b[38;5;5m13:49:59.29 \u001b[0m\u001b[38;5;2mINFO \u001b[0m ==> Persisting Ghost installation"}
{"date":"2025-06-05 13:49:59.387701","time":"2025-06-05T13:49:59.38770119Z","stream":"stderr","_p":"F","log":"\u001b[38;5;6mghost \u001b[38;5;5m13:49:59.38 \u001b[0m\u001b[38;5;2mINFO \u001b[0m ==> ** Ghost setup finished! **"}
{"date":"2025-06-05 13:49:59.387736","time":"2025-06-05T13:49:59.387736981Z","stream":"stdout","_p":"F","log":""}
{"date":"2025-06-05 13:49:59.451176","time":"2025-06-05T13:49:59.451176821Z","stream":"stderr","_p":"F","log":"\u001b[38;5;6mghost \u001b[38;5;5m13:49:59.45 \u001b[0m\u001b[38;5;2mINFO \u001b[0m ==> ** Starting Ghost **"}
{"date":"2025-06-05 13:50:00.171207","time":"2025-06-05T13:50:00.171207812Z","stream":"stdout","_p":"F","log":""}
...

Now if we look closer at the array of messages and being the developer we are, we've noticed a mix of stderr and stdout log lines. Let's take control and trim out all the lines that do not contain stderr, as we are only interested in what is broken.

We need to add a filter section to our Fluent Bit configuration using the grep filter and targeting a regular expression to select the keys stream or stderr as follows:

args:
  - --workdir=/fluent-bit/etc
  - --config=/fluent-bit/etc/conf/fluent-bit.yaml

config:
  extraFiles:
    fluent-bit.yaml: |
      service:
       flush: 1
       log_level: info
       http_server: true
       http_listen: 0.0.0.0
       http_port: 2020
      pipeline:
        inputs:
          - name: tail
            tag: kube.*
            read_from_head: true
            path: /var/log/containers/*ghost*
multiline.parser: docker, cri
        filters:
          - name: grep
            match: '*'
            regex: stream stderr
        outputs:
          - name: stdout
            match: '*'
            format: json_lines
            json_date_format: java_sql_timestamp

Update the Fluent Bit instance using a helm update command as follows:

$ helm upgrade --kubeconfig target/2nodeconfig.yaml --install fluent-bit fluent/fluent-bit --set image.tag=4.0.0 --namespace=logging --create-namespace --values=myfluent-bit-helm.yaml

NAME               READY   STATUS    RESTARTS   AGE
fluent-bit-npn8n   1/1     Running   0          12s

Now explore the logs being collected by Fluent Bit and notice the output changes:

$ kubectl --kubeconfig target/2nodeconfig.yaml logs fluent-bit-npn8n --nanmespace logging

...
{"date":"2025-06-05 13:49:34.807524","time":"2025-06-05T13:49:34.807524266Z","stream":"stderr","_p":"F","log":"\u001b[38;5;6mghost \u001b[38;5;5m13:49:34.80 \u001b[0m\u001b[38;5;2mINFO \u001b[0m ==> Configuring database"}
{"date":"2025-06-05 13:49:34.860722","time":"2025-06-05T13:49:34.860722188Z","stream":"stderr","_p":"F","log":"\u001b[38;5;6mghost \u001b[38;5;5m13:49:34.86 \u001b[0m\u001b[38;5;2mINFO \u001b[0m ==> Setting up Ghost"}
{"date":"2025-06-05 13:49:36.289847","time":"2025-06-05T13:49:36.289847086Z","stream":"stderr","_p":"F","log":"\u001b[38;5;6mghost \u001b[38;5;5m13:49:36.28 \u001b[0m\u001b[38;5;2mINFO \u001b[0m ==> Configuring Ghost URL to http://127.0.0.1:2368"}
{"date":"2025-06-05 13:49:36.373376","time":"2025-06-05T13:49:36.373376803Z","stream":"stderr","_p":"F","log":"\u001b[38;5;6mghost \u001b[38;5;5m13:49:36.37 \u001b[0m\u001b[38;5;2mINFO \u001b[0m ==> Passing admin user creation wizard"}
{"date":"2025-06-05 13:49:36.379461","time":"2025-06-05T13:49:36.379461971Z","stream":"stderr","_p":"F","log":"\u001b[38;5;6mghost \u001b[38;5;5m13:49:36.37 \u001b[0m\u001b[38;5;2mINFO \u001b[0m ==> Starting Ghost in background"}
{"date":"2025-06-05 13:49:58.001603","time":"2025-06-05T13:49:58.001603337Z","stream":"stderr","_p":"F","log":"\u001b[38;5;6mghost \u001b[38;5;5m13:49:58.00 \u001b[0m\u001b[38;5;2mINFO \u001b[0m ==> Stopping Ghost"}
{"date":"2025-06-05 13:49:59.291618","time":"2025-06-05T13:49:59.291618721Z","stream":"stderr","_p":"F","log":"\u001b[38;5;6mghost \u001b[38;5;5m13:49:59.29 \u001b[0m\u001b[38;5;2mINFO \u001b[0m ==> Persisting Ghost installation"}
{"date":"2025-06-05 13:49:59.387701","time":"2025-06-05T13:49:59.38770119Z","stream":"stderr","_p":"F","log":"\u001b[38;5;6mghost \u001b[38;5;5m13:49:59.38 \u001b[0m\u001b[38;5;2mINFO \u001b[0m ==> ** Ghost setup finished! **"}
...

Now we are no longer seeing standard output log events as our telemetry pipeline is filtering to only show standard error tagged logs!

This exercise has shown how to format and prune our logs using our Fluent Bit telemetry pipeline on a Kubernetes cluster. Now let's look at how to enrich our log telemetry data.

We are going to add tags to every standard error line pointing the on-call developer to the SRE they need to contact. To do this, we expand our filter section of the Fluent Bit configuration using the modify filter and targeting the keys stream or stderr to remove those keys and add two new keys STATUS and ACTION as follows:

args:
  - --workdir=/fluent-bit/etc
  - --config=/fluent-bit/etc/conf/fluent-bit.yaml

config:
  extraFiles:
    fluent-bit.yaml: |
      service:
       flush: 1
       log_level: info
       http_server: true
       http_listen: 0.0.0.0
       http_port: 2020
      pipeline:
        inputs:
          - name: tail
            tag: kube.*
            read_from_head: true
            path: /var/log/containers/*ghost*
multiline.parser: docker, cri
        filters:
          - name: grep
            match: '*'
            regex: stream stderr
          - name: modify
            match: '*'
            condition: Key_Value_Equals stream stderr
            remove: stream
            add:
              - STATUS REALLY_BAD
              - ACTION CALL_SRE
        outputs:
          - name: stdout
            match: '*'
            format: json_lines
            json_date_format: java_sql_timestamp

Update the Fluent Bit instance using a helm update command as follows:

$ helm upgrade --kubeconfig target/2nodeconfig.yaml --install fluent-bit fluent/fluent-bit --set image.tag=4.0.0 --namespace=logging --create-namespace --values=myfluent-bit-helm.yaml

NAME               READY   STATUS    RESTARTS   AGE
fluent-bit-ftfs4   1/1     Running   0          32s

Now explore the logs being collected by Fluent Bit and notice the output changes where the stream key is missing and two new ones have been added at the end of each error log event:

$ kubectl --kubeconfig target/2nodeconfig.yaml logs fluent-bit-ftfs4 --nanmespace logging

...
[CUT-LINE-FOR-VIEWING] Configuring database"},"STATUS":"REALLY_BAD","ACTION":"CALL_SRE"}
[CUT-LINE-FOR-VIEWING] Setting up Ghost"},"STATUS":"REALLY_BAD","ACTION":"CALL_SRE"}
[CUT-LINE-FOR-VIEWING] Configuring Ghost URL to http://127.0.0.1:2368"},"STATUS":"REALLY_BAD","ACTION":"CALL_SRE"}
[CUT-LINE-FOR-VIEWING] Passing admin user creation wizard"},"STATUS":"REALLY_BAD","ACTION":"CALL_SRE"} [CUT-LINE-FOR-VIEWING] Starting Ghost in background"},"STATUS":"REALLY_BAD","ACTION":"CALL_SRE"} [CUT-LINE-FOR-VIEWING] Stopping Ghost"},"STATUS":"REALLY_BAD","ACTION":"CALL_SRE"} [CUT-LINE-FOR-VIEWING] Persisting Ghost installation"},"STATUS":"REALLY_BAD","ACTION":"CALL_SRE"} [CUT-LINE-FOR-VIEWING] ** Ghost setup finished! **"}
,"STATUS":"REALLY_BAD","ACTION":"CALL_SRE"}
...

Now we have a running Kubernetes cluster, with two nodes generating logs, a workload in the form of a Ghost CMS generating logs, and using a Fluent Bit telemetry pipeline to gather and take control of our log telemetry data.

Initially we found that gathering all log telemetry data was flooding too much information to be able to sift out the important events for our development needs. We then started taking control back of our log telemetry data by narrowing our collection strategy, by filtering, and finally by enriching our telemetry data.

More in the series

In this article you learned how to use Fluent Bit on a Kubernetes cluster to take control of your telemetry data. This article is based on this online free workshop.

There will be more in this series as you continue to learn how to configure, run, manage, and master the use of Fluent Bit in the wild. Next up, integrating Fluent Bit telemetry pipelines with OpenTelemetry.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.