Eric D. Schabell: KubeCon EU 2026 - From Hallucinations to Hardware: Diagnosing LLM Failures

Saturday, January 31, 2026

KubeCon EU 2026 - From Hallucinations to Hardware: Diagnosing LLM Failures

I'm excited to share that a talk I submitted, with my good friend Ryan Peirce, was accepted to Cloud Native AI & Kubeflow Day!

In case you were still unaware, KubeCon + CloudNativeCon is the flagship conference of the Cloud Native Computing Foundation (CNCF), gathering adopters and technologists from leading open source and cloud native communities. The event brings together the entire cloud native ecosystem for education, collaboration, and networking opportunities.

Every year this event features multiple co-located events, and we're excited to be presenting at Cloud Native AI & Kubeflow Day, which focuses on the intersection of cloud native technologies and artificial intelligence workloads.

Below are all the details of our session at the time of this writing.

On Monday, 23 March in Amsterdam, Ryan will be taking the stage to demonstrate real-world troubleshooting of LLM-powered applications for the following session.

From Hallucinations to Hardware: Diagnosing LLM Failures

Generative AI apps can hallucinate—or fail—at the worst possible times. In this live demo session, we'll interact with an LLM-powered application designed to surface both entertaining hallucinations and real-world GPU performance issues. Using open source tools like Prometheus, OTel, NVIDIA DCGM, and OpenInference, we'll troubleshoot problems in real time and trace them from user experience down to infrastructure. See how observability gives engineers and SREs the visibility they need to keep AI systems reliable.

Time: 15:20-15:45

This is a hands-on, live demonstration session where he will be working with a real LLM-powered application. He has intentionally designed it to showcase both the amusing side of AI hallucinations and the serious infrastructure challenges that can impact production systems. The session will walk through the complete troubleshooting journey, from identifying user-facing issues all the way down to GPU performance bottlenecks.

Ryan will be leveraging a powerful stack of open source observability tools including Prometheus for metrics collection, OpenTelemetry for distributed tracing, NVIDIA DCGM for GPU telemetry, and OpenInference for LLM-specific observability. This combination provides the comprehensive visibility needed to diagnose and resolve issues in modern AI infrastructure.

For engineers and SREs working with AI systems, this session will demonstrate practical approaches to maintaining reliability in production environments. The live demo format means you'll see real troubleshooting workflows in action rather than theoretical examples.

Be sure to check the Cloud Native AI & Kubeflow Day schedule for the exact time and location of the session as the time can be adjusted leading up to the co-located event.

Don't miss this session as Ryan is an amazing rock star on stage and his live demo is jaw dropping!

(Note that in the session URL you will see my name listed [from-hallucinations-to-hardware-diagnosing-llm-failures-eric-d-schabell-ryan-peirce], but I will not be on stage. Ryan was kind enough to cover for me as I won't be available to present on that day. Thanks Ryan!)