In the previous article, we looked at the problem of controlling cost in cloud native observability. In this article you'll find the next pitfall discussion that's another common mistake organizations make. By sharing common pitfalls in this series, the hope is that we can learn from them.
After laying the groundwork in the previous article, it's time to tackle a pitfall where we need to stop focusing on The Pillars. I've spent some time in the past talking about Three Phases to Better Observability Outcomes and published an initial take on why Cloud Native Observability Needs Phases, but this article will be a more in depth dive into the topic.
Focusing on The Pillars
For a few years now vendors have been marketing the idea that you need to focus on certain signals or pillars to achieve what you desire in the world of cloud native observability.
Focusing on tooling, not fixing the issue. |
It's like we have a very nice and expensive car that we cherish and it's started to make funny sounds while emitting smoke when we are driving. We rush to our favorite garage and the mechanic listens to our issues, then proceeds to drag out their toolboxes to show off all the great tools they have to fix issues just like ours. While this is going on and on, we look out the window and see that our car is now not just smoking, but it's on fire!
Meanwhile, the issues get worse. |
- Better business outcomes...
- Faster remediation of problems that occur...
- Easier problem detection...
- Greater revenue generation...
- Happier customers...
- Engineering teams focused on delivering business value
These are all in a language the business understands and describes more the process that needs to be designed for, not the features the tooling needs to have. When we bring this back to cloud native observability, we want a solution for our on-call engineers that walk them through the following three phases:
- Knowing - we start by discovering something is happening as fast as possible, maybe even leading to a quick fix in this phase.
- Triaging - if unable to fix immediately, then we start triaging based on specific targeted information that is directly related to the problem at hand which then quickly leads to fixing it.
- Understanding - finally, possibly at a later time and slower investigative pace, we need to have a very deep understanding of the issues encountered to ensure it never happens again.
We don't want to be confronted with visualizations that have been designed and grouping information as categorized signals or as the pillars. For example, here is something that was actually designed without much thought towards the process needed to solve any kind of issue, but it does capture the signals for you:
Good luck with this when you are on-call. |
We really want to have clean, concise, and effective visualizations that present focused insights and put just enough information at our fingertips to make informed decisions quickly. We don't care if one metric, 3 labels, 1 span in a trace, and 3 log lines are the basis of the exact informational view we need to solve the reason our beeper went off:
Sharply focused insights with just enough information to get you through the phases. |
The road to cloud native success has many pitfalls and understanding how to avoid the pillars, focusing instead on solutions for the phases of observability will save much wasted time and energy.
Underestimating impact of cardinality? |
Another pitfall organizations struggle with in cloud native observability is underestimating cardinality issues. In the next article in this series, I'll share why this is a pitfall and how we can avoid it wreaking havoc on our cloud native observability efforts.
Below are the links to the other articles in this series:
- Cloud Native Observability Pitfalls - Introduction
- Cloud Native Observability Pitfalls - Controlling Costs
- Cloud Native Observability Pitfalls - Focusing on The Pillars
- Cloud Native Observability Pitfalls - Underestimating Cardinality
- Cloud Native Observability Pitfalls - Ignoring Existing Landscape
- Cloud Native Observability Pitfalls - The Protocol Jungle
- Cloud Native Observability Pitfalls - Sneaky Sprawling Mess
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.