Eric D. Schabell: Telemetry Pipelines Workshop

Monday, March 4, 2024

Telemetry Pipelines Workshop - Introduction to Fluent Bit

Are you ready for getting started with cloud native observability with telemetry pipelines?

This article is part of a series exploring a workshop guiding you through the open source project Fluent Bit, what it is, a basic installation, and setting up a first telemetry pipeline project. Learn how to manage your cloud native data from source to destination using the telemetry pipeline phases covering collection, aggregation, transformation, and forwarding from any source to any destination.

Since Chronosphere acquired the capabilities for integrating telemetry pipelines, I've been digging into how this works, the use cases it solves, and having a lot of fun with the basis, CNCF project Fluent Bit. This workshop is the result of my sharing how to get started with telemetry pipelines and all that you can do with Fluent Bit.

This first article in the series provides an introduction to Fluent Bit where we gain an understanding of its role in the cloud native observability world. You can find more details in the accompanying workshop lab.

Before we get started, let's get a baseline for defining cloud native observability pipelines. As noted in a recent trend report:

Observability pipelines are providing real-time filtering, enrichment, normalization and routing of telemetry data

The rise in the amount of data being generated in cloud native environments has become such a burden for teams trying to manage it all, as well as a burden to organization's budgets. As they are searching for more control over all this telemetry data, from collecting, processing, and routing, to storing and querying.

Data pipelines have gained traction in helping organizations deal with the challenges they are facing by providing a powerful way to lower ingestion volumes and help reduce data costs.

Some of the benefits are that telemetry pipelines act as a telemetry gateway between cloud native data and organizations. It's performing real-time filtering, enrichment, normalization, and routing to cheap storage. This reduces dependencies on expensive and often proprietary storage solutions.

Another plus for organizations is the ability to reformat collected data on the fly, often bridging the gap between legacy or non-standards based data structures to current standards. They can achieve this without having to update code, re-instrument, or redeploy existing applications and services.

Telemetry pipelines

This workshop focuses solely on Fluent Bit as the open source telemetry pipeline project. From the project documentation, Fluent Bit is an open source telemetry agent specifically designed to efficiently handle the challenges of collecting and processing telemetry data across a wide range of environments, from constrained systems to complex cloud infrastructures. It's effective at managing telemetry data from various sources and formats can be a constant challenge, particularly when performance is a critical factor.

While the term observability pipelines is thrown about to cover all kinds of general pipeline activities, the focus in this workshop will be more on telemetry pipelines. This is due to our focus on getting all different types of telemetry from their origins to the destinations we desire and as noted in the previously referenced trend report:

Telemetry pipelines providing real-time filtering, enrichment, normalization and routing of telemetry data

Rather than serving as a drop-in replacement, Fluent Bit enhances the observability strategy for your infrastructure by adapting and optimizing your existing logging layer, as well as metrics and traces processing. Furthermore, Fluent Bit supports a vendor-neutral approach, seamlessly integrating with other ecosystems such as Prometheus and OpenTelemetry.

Fluent Bit can be deployed as an edge agent for localized telemetry data handling or utilized as a central aggregator or collector for managing telemetry data across multiple sources and environments. Fluent Bit has been designed for performance and low resource consumption.

As a telemetry pipeline, Fluent Bit is designed to process logs, metrics, and traces at speed, scale, and with flexibility.

What about Fluentd?

First there was Fluentd, a CNCF Graduated Project. It's an open source data collector for building the unified logging layer . When installed, it runs in the background to collect, parse, transform, analyze and store various types of data.

Fluent Bit is a sub-project within the Fluentd ecosystem. It's considered a Lightweight Data Forwarder for Fluentd. Fluent Bit is specifically designed for forwarding the data from the edge to Fluentd aggregators.

Both projects share similarities, Fluent Bit is fully designed and built on top of the best ideas of Fluentd architecture and general design:

Understanding the concepts

Before we dive into using Fluent Bit, it's important to have an understanding of the key concepts, so let's explore the following:

Event or Record - each incoming piece of data is considered an Event or a Record.
Filtering - the process of altering, enriching, or dropping an Event.
Tag - an internal string used by the Router in later stages of our pipeline to determine which filters or output phases an Event must pass through.
Timestamp - assigned to each Event as it enters a pipeline and is always present.
Match - represent a rule applied to Events where it examines its Tags for matches.
Structured Message - the goal is to ensure that all message have a structured format, defined as having keys and values.

The actual workshop lab gives examples and more details, but for this article I'm keeping it as a summary of the concepts and leaving further exploration to the reader.

Pipeline phases

A telemetry pipeline is where data goes through various phases from collection to final destination. We can define or configure each phase to manipulate the data or the path it's taking through our telemetry pipeline.

The first phase is INPUT, which is where Fluent Bit uses Input Plugins to gather information from specific sources. When an input plugin is loaded it creates an instance which we can configure using the plugins properties.

The second phase is PARSER, which is where unstructured input data is turned into structured data. Fluent Bit does this using Parsers that we can configure to manipulate the unstructured data producing structured data for the next phases of our pipeline.

FILTER phase is when we modify, enrich, or delete any of the collected Events. Fluent Bit provides many out of the box plugins as Filters that can match, exclude, or enrich your structured data before it moves onwards in the pipeline. Filters can be configured using the provided properties.

BUFFER phase is where the data is stored, using in-memory or the file system based options. Note that when data reaches the buffer phase it's in an immutable state (no more filtering) and that buffered data is not raw text, but in an internal binary representation for storage.

The next phase is ROUTING, which is where Fluent Bit uses the previously discussed Tag and Match concepts to determine which output destinations to send data. During the Input phase data is assigned a Tag, during the Routing phase data is compared to Match rules from output configurations, if it matches then the data is sent to that output destination.

The final phase is OUTPUT, which is where Fluent Bit uses Output Plugins to connect with specific destinations. These destinations can be databases, remote services, cloud services, and more. When an input plugin is loaded it creates an instance which we can configure using the plugins properties.

For code examples for these phases and more details around telemetry pipeline phases, see the workshop lab.

What's next?

This article was an introduction to telemetry pipelines and Fluent Bit. This series continues with the next step in this workshop, installing Fluent Bit on your local machine from source or using container images.

Stay tuned for more hands on material to help you with your cloud native observability journey.

No comments:

Free Trend Report (download)

For DZone's 2024 Cloud Native Trend Report, we further explored these pillars, focusing our research on learning how nuanced technology and methodologies are driving the vision for what cloud native means and entails today. The articles, contributed by experts in the DZone Community, bring the pillars into conversation via topics such as automating the cloud through orchestration and AI, using shift left to improve delivery and strengthen security, surviving observability challenges, and strategizing cost optimizations... (contributing author Eric D. Schabell)