In the pursuit of Ops Happiness... |
Previously in The Quest for Operations Intelligence, the focus was placed on what can be delivered with log aggregation and how to improve it. A conclusion was that to have full situational awareness on IT, you would need logs, metrics, configuration and events information correlated for easy one stop analysis when problems arise.
While we talked about logs, metrics and configuration in depth, we left events at the time without any sort of definition. What are events and what can we use them for in our quest for operations happiness?
Event happiness
Those most effected by this quest are the system administrators, who are the ones on call when things go wrong in your infrastructure. When the call comes in the middle of the night, this is the moment when log aggregation and metrics can save very precious time in finding the cause of failure.The question is, what's happened to bring the system administrator to his post in the deep dark of night?