Eric D. Schabell: KubeCon - Summary of the Open Observability Day North America

Tuesday, October 25, 2022

KubeCon - Summary of the Open Observability Day North America

Today was the one day inaugural event known as Open Observability Day, held as one of the off-site options before the full KubeCon and CloudNativeCon event this week in Detroit.

It was on-site at the Huntington Place Convention Center, which is on the river with views across the water into Canada. Just a bit of geography as many attendees I spoke with were not aware that Detroit was so close to the northern US border.

The full schedule for Open Observability Day is available online but wanted to share an overview impression of what it was like to be there.

The day is centered around all the CNCF projects related to open observability and is full of both vendors and project focused talks.

Let's look closer at my impressions of the sessions I found interesting.

The day started with CNCF project founder Bartek PÅ‚otka and  his overview of the day, the project updates across Thanos, Fluntd, OpenTelemetry, Jeager, and more. He then transitioned to the two short keynotes.

Distributed Tracing - The Struggle is Real

Ian Smith, Field CTO from Chronosphere shared hist thoughts after nine years in the field working with distributed tracing solutions. He takes us on a whirlwind tour of where it came from and where it might be going along with what technical problems around tooling supporting distributed tracing. A great quote as takeaway:

"Tracing has become the high-promise, high-effort, low-value story."

He goes on to highlight how the focus on developer tooling needs to turn this around and start providing some of the promise, with less effort, and more value.

Observability Simplified

Eduardo Silva, CEO from Calyptia shared their storyline from creating the Fluentd project to the new Fluent-bit project focused on cloud native environments. He then walked through their experiences in the logging space building the Fluent Bit project and how extending the ecosystem to support Metrics, and Traces has helped shape a simplified user observability experience He announced the release of Calyptia Core, using open source tooling to collect data through data pipelines without using agents. It's free to use right now and can be installed into existing Kubernetes clusters. They also have a docker desktop extension.

Both keynotes were very short, just 10 minutes, after which the main talks started.

Building Observability Pipeline with Fulent-bit

Chao Xu from LinkedIn talked about how they transitioned off existing closed tooling for their observability pipeline to open source and open standards. They mainly use Fluent-bit and Open Telemetry, they also expanded their instrumentation of languages from just Java applications to Go, C++, and Python. They consolidated their tracing and logs into a single pipeline instead of separate data pipelines, creating simpler maintenance standards and less resource loads. Big believers of the OTEL Collector, but they expanded it as their new expanded Observability Agent to support data conversion and filtering along with the ingestion of OTEL data streams. LinkedIn also really likes the enhanced tag management that Fluent-bit offers to handle the various data streams. 

Why Large-Scale Observability Needs Graph

Richard Benwell from SquaredUp takes a deep dive into the observability Wikipedia page, which is a rather interesting way to try to build the foundation of what we mean by o11y. He uses this to show that we have signals with metrics, logs and tracing, but we are missing the model of our system in current observability platforms. This talk postulates that signals are useless without models. He goes on to use architectures as models for the metrics, logs, and tracing we are gathering. This begs the question, do you need architects to design your models, or do you just generate models such as tracing tools often do? Also the model is nice, it helps with understanding, but you need to be able to gain insights into the meanings of the data your gathering and modeling. The talk then dived into a graph 101 course that we all took at university, with vertex to edge to vertex type of stories. It brought back fond memories of both math courses and AI domain modeling to solve problem domains such as healthcare diagnostics.

Confidence with Chaos for Your Kubernetes Observability 

Michael Friedrich from GitLab shares how we've gone from running cloud native environments to monitoring them with CNCF projects like Prometheus, Perces, Graphans, etc. Now we are buried under all the incoming data, which is not a new concept. So now that we have this he shares a few ideas about breaking things on purpose to see how it behaves, monitors, and recovers. Highlights the project Chaos Mesh and it's an interesting idea of how to see how entire environments will respond to problems. Talk ends with a live demo of the use of Chaos Mesh.

Before and after lunch there were several lighting talks, just short 10 minute sessions.

- Achieving Unified Observability for Cloud and Edge with FluentBit

- Making Sense of Observability with Auto-Discovered Security Policies

- Managing OpenTelemetry Through the OpAMP Protocol

- OTel Me How to Build a Data Pipeline for Observability

- What Can eBPF Actually do for Modern-day Observability?

The afternoon finished up with full breakout sessions:

Adopting Open Telemetry Collector @ eBay - Swapping Engines Mid Flight

Vijay Samuel from eBay shared experiences of moving from Elastic Beats for traces to Open Telemetry. He talked about their cloud native scale, the problems they've had, the journey from Metric Beats to OTEL collector, bridging the gaps around dynamic config reloading, and ensuring data parity after the migration. Very interesting and they are looking for engineers (slides included QR code if you want to join eBay).

Leveraging OpenTelemetry for Your Prometheus Pipeline

Goutham Veeramachaneni from Grafana Labs and Prometheus maintainer for over five years shares how to leverage OTEL in your Prometheus data pipelines to add traces to your metrics infrastructure.

This overview does not include all of the talks held today, but gives a nice impression. I must admit, I was unable to capture all of the sessions due to networking that happens in the breaks. Several times I got into in depth discussions that kept me out in the halls or at a booth longer than the breaks were planned for, but that's what these events are for!