Each article in this series addresses a single topic by providing insights into what the topic is, why we are interested in exploring that topic, where to get started with the topic, and how to get hands-on with learning about the topic as it relates to the Fluent Bit project.
The idea is that each article can stand on its own, but that they also lead down a path that slowly increases our abilities to implement solutions with Fluent Bit telemetry pipelines.
Let's take a look at the topic of this article, using Fluent Bit processors for developers. In case you missed the previous article, check out the top tips on using telemetry pipeline parsers for developers where you get tips on cleaning up your telemetry data for better developer experiences.
This article will be a hands-on tour of the things that help you as a developer testing out your Fluent Bit pipelines. We'll take a look at the top 3 processors you'll want to know about when building your telemetry pipeline configurations in Fluent Bit.
All examples in this article have been done on OSX and are assuming the reader is able to convert the actions shown here to their own local machines.
Where to get started
You should have explored the previous articles in this series to install and get started with Fluent Bit on your developer local machine, either using the source code or container images. Links at the end of this article will point you to a free hands-on workshop that lets you explore more of Fluent Bit in detail.
You can verify that you have a functioning installation by testing your Fluent Bit, either using a source installation or a container installation as shown below:
# For source installation.$ fluent-bit -i dummy -o stdout
# For container installation.$ podman run -ti ghcr.io/fluent/fluent-bit:4.0.8 -i dummy -o stdout
...
[0] dummy.0: [[1753105021.031338000, {}], {"message"=>"dummy"}]
[0] dummy.0: [[1753105022.033205000, {}], {"message"=>"dummy"}]
[0] dummy.0: [[1753105023.032600000, {}], {"message"=>"dummy"}]
[0] dummy.0: [[1753105024.033517000, {}], {"message"=>"dummy"}]...Let's look at the top three processors that will help you with your local development testing of Fluent Bit pipelines.
Processing in a telemetry pipeline
See this article for details about the service section of the configurations used in the rest of this article, but for now we plan to focus on our Fluent Bit pipeline and specifically the processors that can be of great help in managing our telemetry data during testing in our inner developer loop.
Processors in Fluent Bit are powerful components that sit between the input and output phases of your telemetry pipeline. They allow you to manipulate, transform, and enrich your telemetry data before it reaches its destination. Unlike filters which operate on records, processors work on the raw data stream level, giving you fine-grained control over how your data flows through the pipeline.
The processor phase happens after data is ingested but before it's formatted for output. This makes processors ideal for operations that need to happen at scale across your entire data stream, such as content modification, metrics extraction, and data aggregation.
Keeping all of this in mind, let's look at the most interesting processors that developers will want to know more about.
1. Content Modifier processor
One of the most common use cases for telemetry pipelines that developers will encounter is the need to add, modify, or remove fields from your telemetry data. The Content Modifier processor gives you the ability to manipulate the structure and content of your events as they flow through the pipeline.
To provide an example we start with a simple Fluent Bit configuration file fluent-bit.yaml containing a configuration using the dummy plugin to generate events that we'll then modify:
service: flush: 1
log_level: info
http_server: on
http_listen: 0.0.0.0
http_port: 2020
hot_reload: on
pipeline:
inputs:
- name: dummy
tag: app.logs
dummy: '{"environment":"dev","message":"Application started"}'
processors:
logs:
- name: content_modifier
action: insert
key: pipeline_version
value: "1.0.0"
- name: content_modifier
action: insert
key: processed_timestamp
value: "${HOSTNAME}"
- name: content_modifier
action: rename
renamed_key: env
key: environment
outputs:
- name: stdout
match: '*'
format: json_lines
json_date_format: java_sql_timestampOur configuration uses the content_modifier processor three times to demonstrate different actions. First, we insert a new field called pipeline_version with a static value. Second, we insert a processed_timestamp field that references an environment variable. Third, we rename the environment field to env for consistency.
Let's run this to confirm our working test environment:
# For source installation.$ fluent-bit --config fluent-bit.yaml# For container installation after building new image with your# configuration using a Buildfile as follows:## FROM ghcr.io/fluent/fluent-bit:4.1.0# COPY ./fluent-bit.yaml /fluent-bit/etc/fluent-bit.yaml# CMD [ "fluent-bit", "-c", "/fluent-bit/etc/fluent-bit.yaml" ]#$ podman build -t fb -f Buildfile$ podman run --rm fb...{"date":"2025-10-26 20:45:12.123456","env":"dev","message":"Application started","pipeline_version":"1.0.0","processed_timestamp":"localhost"} {"date":"2025-10-26 20:45:13.234567","env":"dev","message":"Application started","pipeline_version":"1.0.0","processed_timestamp":"localhost"}...
Note how each event now contains the additional fields we configured, and the original environment field has been renamed to env. This processor is invaluable for standardizing your telemetry data before it reaches your backend systems.
2. Metrics Selector processor
Another critical use case for developers working with telemetry data is the ability to extract and select specific metrics from your event streams. The Metrics Selector processor allows you to filter and route metrics based on their labels and values, giving you precise control over which metrics flow to which destinations.
To demonstrate this we'll create a configuration that generates different types of metrics and uses the metrics selector to route them appropriately:
service: flush: 1
log_level: info
http_server: on
http_listen: 0.0.0.0
http_port: 2020
hot_reload: on
pipeline:
inputs:
- name: dummy
tag: metrics.cpu
dummy: '{"metric":"cpu_usage","value":75.5,"host":"server01","env":"production"}'
- name: dummy
tag: metrics.memory
dummy: '{"metric":"memory_usage","value":82.3,"host":"server01","env":"production"}'
- name: dummy
tag: metrics.disk
dummy: '{"metric":"disk_usage","value":45.2,"host":"server02","env":"staging"}'
processors:
logs:
- name: metrics_selector
metric_name: cpu_usage
action: include
label: env
operation_type: prefix_match
match: prod
outputs:
- name: stdout
match: 'metrics.cpu'
format: json_lines
json_date_format: java_sql_timestamp
- name: stdout
match: 'metrics.*'
format: json_lines
json_date_format: java_sql_timestampOur configuration generates three different metric types and uses the metrics_selector processor to filter CPU metrics that match production environments. This allows you to create sophisticated routing rules based on your metric characteristics.
Let's run this configuration:
# For source installation.$ fluent-bit --config fluent-bit.yaml# For container installation after building new image with your# configuration using a Buildfile as follows:## FROM ghcr.io/fluent/fluent-bit:4.1.0# COPY ./fluent-bit.yaml /fluent-bit/etc/fluent-bit.yaml# CMD [ "fluent-bit", "-c", "/fluent-bit/etc/fluent-bit.yaml" ]#$ podman build -t fb -f Buildfile$ podman run --rm fb...{"date":"2025-10-26 21:10:33.456789","metric":"cpu_usage","value":75.5,"host":"server01","env":"production"}{"date":"2025-10-26 21:10:33.567890","metric":"memory_usage","value":82.3,"host":"server01","env":"production"}{"date":"2025-10-26 21:10:33.678901","metric":"disk_usage","value":45.2,"host":"server02","env":"staging"}...
The metrics selector processor helps you focus on the metrics that matter most during development and testing, reducing noise and improving the signal-to-noise ratio in your telemetry data.
3. OpenTelemetry Envelope processor
The third essential processor that developers need to understand is the OpenTelemetry Envelope processor. This processor transforms your Fluent Bit telemetry data into the OpenTelemetry protocol format, enabling seamless integration with the broader OpenTelemetry ecosystem. As organizations increasingly adopt OpenTelemetry as their standard for observability data, this processor becomes critical for ensuring your Fluent Bit pipelines can communicate effectively with OpenTelemetry collectors and backends.
The OpenTelemetry Envelope processor wraps your telemetry data in the standard OpenTelemetry format, preserving all the semantic conventions and structures that make OpenTelemetry powerful. This includes proper handling of resource attributes, instrumentation scope, and the telemetry signal types that are core to OpenTelemetry.
For comprehensive coverage of integrating Fluent Bit with OpenTelemetry, I highly recommend exploring these detailed articles:
Telemetry Pipelines: Integrating Fluent Bit with OpenTelemetry, Part 1 - This article covers the fundamentals of integrating Fluent Bit with OpenTelemetry, including configuration patterns and best practices for getting started.
Integrating Fluent Bit with OpenTelemetry, Part 2 - This follow-up article dives deeper into advanced integration scenarios, troubleshooting tips, and real-world use cases for production deployments.
To demonstrate how the OpenTelemetry Envelope processor works, let's create a configuration that wraps application logs in OpenTelemetry format:
service: flush: 1
log_level: info
http_server: on
http_listen: 0.0.0.0
http_port: 2020
hot_reload: on
pipeline:
inputs:
- name: dummy
tag: app.logs
dummy: '{"level":"info","service":"user-api","message":"User login successful","user_id":"12345"}'
- name: dummy
tag: app.logs
dummy: '{"level":"error","service":"payment-api","message":"Payment processing failed","transaction_id":"tx-9876"}'
processors:
logs:
- name: opentelemetry_envelope
resource:
service_name: my-application
service_version: 1.2.3
deployment_environment: production
instrumentation_scope:
name: fluent-bit
version: 4.2.0
outputs:
- name: stdout
match: '*'
format: json_lines
json_date_format: java_sql_timestamp
Our configuration uses the opentelemetry_envelope processor to wrap each log entry with OpenTelemetry metadata. The resource section adds attributes that describe the source of the telemetry data, such as the service name and deployment environment. The instrumentation_scope section identifies the tool that collected the data, which is essential for proper attribution in OpenTelemetry systems.
Let's run this configuration to see the OpenTelemetry envelope in action:
# For source installation.$ fluent-bit --config fluent-bit.yaml# For container installation after building new image with your# configuration using a Buildfile as follows:## FROM ghcr.io/fluent/fluent-bit:4.1.0# COPY ./fluent-bit.yaml /fluent-bit/etc/fluent-bit.yaml# CMD [ "fluent-bit", "-c", "/fluent-bit/etc/fluent-bit.yaml" ]#$ podman build -t fb -f Buildfile$ podman run --rm fb...{"date":"2025-10-26 22:15:30.123456","resource":{"service_name":"my-application","service_version":"1.2.3","deployment_environment":"production"},"instrumentation_scope":{"name":"fluent-bit","version":"4.1.0"},"level":"info","service":"user-api","message":"User login successful","user_id":"12345"}{"date":"2025-10-26 22:15:31.234567","resource":{"service_name":"my-application","service_version":"1.2.3","deployment_environment":"production"},"instrumentation_scope":{"name":"fluent-bit","version":"4.1.0"},"level":"error","service":"payment-api","message":"Payment processing failed","transaction_id":"tx-9876"}...
Notice how each log entry now includes the OpenTelemetry resource attributes and instrumentation scope information. This standardized format ensures that when your telemetry data reaches an OpenTelemetry collector or backend, it will be properly categorized and can be correlated with other telemetry signals like traces and metrics from your distributed system.
This covers the top 3 processors for developers getting started with Fluent Bit while trying to leverage processors to transform and enrich their telemetry data quickly and speed up their inner development loop.
More in the series
In this article you learned about three powerful Fluent Bit processors that improve the inner developer loop experience. This article is based on this online free workshop.
There will be more in this series as you continue to learn how to configure, run, manage, and master the use of Fluent Bit in the wild. Next up, exploring some of the more interesting Fluent Bit multiline parsers for developers.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.