Eric D. Schabell: O11y Guide - Keeping Your Cloud Native Observability Options Open

Tuesday, October 18, 2022

O11y Guide - Keeping Your Cloud Native Observability Options Open

This is the fourth article in the series covering my journey into the world of cloud native observability. If you missed any of the previous articles, head on back to the introduction for a quick update.

After laying out the groundwork for this series in the initial article, I spent some time in the second article sharing who the observability players are. I also discussed the teams that these players are on in this world of cloud native o11y. For the third article I looked at the ongoing discussion around monitoring pillars versus phases.

Being a developer from my early days in IT, it's been very interesting to explore the complexities of cloud native o11y. Monitoring applications goes way beyond just writing and deploying code, especially in the cloud native world. One thing remains the same, maintaining your organization's architecture always requires both a vigilant outlook and an understanding of available open standards.

In this forth article I'm going to look at architecture level choices being made and share the open standards with the open source landscape. 

As any architect will tell you, open standards are always preferred when considering adding on to your existing infrastructure. Does the candidate component under consideration adhere to some defined open standard? Does it at least conform to using open standards? 

The open choice

When an open standard exists, and in some early cases open consensus where everyone centers around a technology or protocol, it gives an architect peace of mind. You often have choices as to the final component you want to use, as long as it's based on a standard you feel as if you can swap it out in the future.

An example of one such standard is the Open Container Initiative (OCI) for container tooling in a cloud native environment. When ensuring your organization's architecture uses such a standard, all components and systems interacting with your containers become replaceable by any future choices you might make as long as they follow the same standard. This creates choice and choice is a good thing!

Open o11y projects

In cloud native observability (o11y), there are many open source projects to help you tackle the initial tasks of o11y. Many are closely associated with the Cloud Native Computing Foundation (CNCF) as projects and promote open standards where possible. Some of them have even become an unofficial open standard by their default mass usage in the o11y domain. 

Let's explore a few of the most commonly encountered cloud native o11y projects.

Prometheus

Prometheus is a graduated project under the CNCF umbrella, which is defined as "...considered stable and used in production." It's listed as a monitoring system and time series database, but the project site itself advertises that it is used to power your metrics and alerting with the leading open source monitoring solution.

What does Prometheus do for you? 

It provides a flexible data model that allows for you to identify time series data, which is a sequence of data points indexed in time order, by assigning a metric name. Time series are stored in memory and on local disk in an efficient format. Scaling is done by functional sharing, splitting data across the storage, and federation.

Leveraging the metrics data is done with a very powerful query language called PromQL which we will cover in the next section. Alerts for your systems are set up using this query language and a provided alert manager for notification. 

There are multiple modes provided for visualizing the data collected, from a built-in expression browser, integration with grafana dashboards, to a console templating language. There are also many client libraries available to help you easily instrument existing services in your architecture. If you want to import existing third-party data into Prometheus, there are many integrations available for you to leverage.

Each server runs independently making it an easy starting point and reliable out of the box with only local storage to get started. It's written in the Go language and all binaries are statically linked for easy deployment and performance. 

There is a Prometheus organization with all the code bases for their projects.

PromQL

This is officially a part of the Prometheus project, but well worth mentioning on its own as an unofficial standard used widely to query ingested time series data. As stated in the Prometheus documentation:

"Prometheus provides a functional query language called PromQL (Prometheus Query Language) that lets the user select and aggregate time series data in real time. The result of an expression can either be shown as a graph, viewed as tabular data in Prometheus's expression browser, or consumed by external systems via the HTTP API."

There are various ways to learn about how to write queries in PromQL, but a fun little project called PromLens with an online demo that helps you accelerate your use, understanding, and troubleshooting of PromQL. You can also easily spin up a Docker image with the tool setup for exploration on your own local machine. Visually building queries of your time series data is a big boost to your productivity. 

There is a good background story on the origins of PromQL in an interview with the creator Julius Volz.

OpenTelemetry

Another up and coming project is found in the incubating section of the CNCF site, it's called OpenTelemetry (OTEL). This is a very fast growing project with a focus on "high-quality, ubiquitous, and portable telemetry to enable effective observability."

This project helps you to generate telemetry data from your applications and services, then forwarding that in what is now considered a standard form, called the OTEL Protocol, to a variety of monitoring tools. To generate the telemetry data you have to first instrument your code, but OTEL makes this very easy with automatic instrumentation through their integration with many existing languages.

You can find the community and their code in the Open-Telemetry organization.

Jaeger

Before OTEL was on the scene, the CNCF project Jaeger provided a distributed tracing platform that has targeted the cloud native microservice industry.

"Jaeger is open source, end-to-end distributed tracing. Monitor and troubleshoot transactions in complex distributed systems."

While this project is fully matured, it's targeted an older protocol and has just recently retired their classic client libraries while advising users to migrate to their native support for the OTEL Protocol standard.

Fluentd

A project written in C and Ruby that is a graduated CNCF project that states, Fluentd is an open source data collector for unified logging layer. Fluentd allows you to unify data collection and consumption for a better use and understanding of data.”

Under the umbrella of fluentd you’ll find a new project called Fluent Bit. The documentation says, “Fluent Bit is an open source and multi-platform log processor tool which aims to be a generic Swiss knife for logs processing and distribution.”


Start your observability engines

This concludes the short overview of the open source projects and (un)official standards that you will encounter when getting started with cloud native o11y. This brings me to the first step in getting hands on where we want to start exploring the open source projects, with the understanding that we are starting without issues of having to scale yet. 

Next up, I plan to take a look at how traditional or older monitoring for monolithic solutions and infrastructure integrates into cloud native o11y.