|In the pursuit of Ops Happiness...|
(Written with guest author: Miguel Perez Colino, Senior Product Manger, Integrated Solutions Business Unit, Red Hat)
As covered in the previous article, The Quest for Operations Intelligence, we have very high expectations from any modern Cloud architecture applications deployed on Red Hat hybrid cloud solutions.
No matter how much support is put into place, the customer needs to be able to operate their hybrid clouds.
After taking a look a correlating all of the available data we reached a conclusion in the previous article that we needed to do something more structured.
It’s all about the model
We may wonder if we have anything missing that's why looking at some metrics aggregation project can help. We find the answer to this in two places. First, Juan Manuel Rey Portal, an Architect with experience in virtualized operations, described how some ops tooling attempts to correlate events with logs. Second, a blog post titled Hawkular APM supports OpenTracing and Alerts describes how they can generate events such as alerts with the collected metrics.
To summarize, to have a good IT situational awareness we need the following data:
- Configuration Data (tracked periodically)
- Events (Alerts and Actions like Application updates, Software upgrades, etc)
In the process to transform data to information, we need to understand what each piece of data represents. This requires us to have a related data structure for the logs, metrics and configurations that are being processed. There is a need to define a Common Data Model for it, for example starting with the data received from an OpenShift Container Platform. In words of Peter Portante, “A Common Data Model is about defining namespaces to avoid conflicts, defining common fields that should be shared, and providing field data definitions for clarity.”
This is key to provide meaning to data, to integrate different sources, to connect, to share it with different third party analysis tools and to make IT more understandable. It would be of great value to share such a model, to let it grow in an open source way, into an open standard hosted for example by The Linux Foundation or the Cloud Native Computing Foundation.
Once a Common Data Model adds meaning to the data (some call it tagging, others processing, others enrichment) it becomes information ready to be correlated. We are performing something like Military Intelligence:
Information collection and analysis to provide guidance and direction to commanders in support of their decisions. This is achieved by providing an assessment of data from a range of sources, directed towards the commander's mission requirements or responding to questions as part of operational or campaign planning.
For IT Intelligence, instead of commanders we have different personas that could benefit from having all this information aggregated and correlated:
As is often stated at Red Hat, "We grow when we share..." and this also applies to to IT Intelligence. There are many partners that can make good use of the data being collected and processed. It is important to be prepared to share, since it can be re-processed and correlated with even more data from firewalls, network equipment, internal database stats, etc. There is Red Hat Insights and a whole ecosystem of associated tools that can provide added value to our solutions. Having a well defined, unified point of contact to gather this data can help us to reduce the deployment and operational costs of our tooling and third parties tooling. It also gives us the opportunity to have a certification mechanism for it.
In summary, log aggregation is the necessary starting point to have the situational awareness of a full cloud deployment to operate it efficiently. To achieve IT intelligence, more relevant data is required such as metrics, configuration and events. This data has to be interpreted with a Common Data Model to be able to correlate it and transform it into useful information. This could become the access point to a whole ecosystem that can extract even more value from that information.
Call to action
What's next? We are working on prototypes to learn on how to put the above information to work and to learn how users are solving this problem. If you want to lend a hand or join us in this effort, you may contribute in the ViaQ GitHub repo or by filling this form with your own experience.
Next in the series, Events and Monitoring Supercharging your Operational Intelligence.