diff options
| author | Paul Buetow <paul@buetow.org> | 2025-12-24 00:43:12 +0200 |
|---|---|---|
| committer | Paul Buetow <paul@buetow.org> | 2025-12-24 00:43:12 +0200 |
| commit | 64bdf652a95094f30d1535130d0cfc1cec4a645e (patch) | |
| tree | 97bd422c9b30872475cb8e03dc90e8fadf5f87ab | |
| parent | 6cb0f632d33741b27eed63333d041627c2987ced (diff) | |
Update content for gemtext
| -rw-r--r-- | about/novels.gmi | 3 | ||||
| -rw-r--r-- | about/resources.gmi | 202 | ||||
| -rw-r--r-- | gemfeed/DRAFT-x-rag-observability-hackathon.gmi | 885 | ||||
| -rw-r--r-- | gemfeed/DRAFT-x-rag-observability-hackathon.gmi.tpl (renamed from gemfeed/DRAFT-x-rag-observability.gmi.tpl) | 432 | ||||
| -rw-r--r-- | gemfeed/x-rag-observability-hackathon/dashboard-pod-system-metrics.png | bin | 0 -> 914327 bytes | |||
| -rw-r--r-- | gemfeed/x-rag-observability-hackathon/dashboard-xrag-overview.png | bin | 0 -> 342684 bytes | |||
| -rw-r--r-- | gemfeed/x-rag-observability-hackathon/index-node-graph.png (renamed from gemfeed/x-rag-observability/index-node-graph.png) | bin | 201872 -> 201872 bytes | |||
| -rw-r--r-- | gemfeed/x-rag-observability-hackathon/index-trace.png (renamed from gemfeed/x-rag-observability/index-trace.png) | bin | 236012 -> 236012 bytes | |||
| -rw-r--r-- | gemfeed/x-rag-observability-hackathon/search-node-graph.png (renamed from gemfeed/x-rag-observability/search-node-graph.png) | bin | 186601 -> 186601 bytes | |||
| -rw-r--r-- | gemfeed/x-rag-observability-hackathon/search-trace.png (renamed from gemfeed/x-rag-observability/search-trace.png) | bin | 239660 -> 239660 bytes | |||
| -rw-r--r-- | index.gmi | 2 | ||||
| -rw-r--r-- | notes/search-inside-yourself.gmi.tpl.lock | 0 | ||||
| -rw-r--r-- | uptime-stats.gmi | 140 |
13 files changed, 1307 insertions, 357 deletions
diff --git a/about/novels.gmi b/about/novels.gmi index 9ac5e643..6bb7dc7d 100644 --- a/about/novels.gmi +++ b/about/novels.gmi @@ -51,7 +51,8 @@ _-" . ' + . . ,//////0\ | /00HHHHHHHMMMMM * 2001 - Chasm City - Revelation Space Universe, Paperback * 2002 - Redemption Ark (english) / Die Arche (german) - Revelation Space Universe, Paperback * 2003 - Absolution Gap (english) / Offenbarung (german) - Revelation Space Universe, Paperback -* 2005 - Diamond Dogs, Turquoise Days (english ) / Träume von Unendlichkeit (german) - Revelation Space Universe, Paperback +* 2005 - Diamond Dogs, Turquoise Days (english) / Träume von Unendlichkeit (german) - Revelation Space Universe, Paperback +* 2021 - Inhibitor Phase - Revelation Space Universe, Audiobook (Libro.fm) ### Arthur C. Clarke diff --git a/about/resources.gmi b/about/resources.gmi index 9e6f6f72..b57a6c21 100644 --- a/about/resources.gmi +++ b/about/resources.gmi @@ -35,110 +35,110 @@ You won't find any links on this site because, over time, the links will break. In random order: -* DNS and BIND; Cricket Liu; O'Reilly -* Raku Recipes; J.J. Merelo; Apress -* Site Reliability Engineering; How Google runs production systems; O'Reilly -* C++ Programming Language; Bjarne Stroustrup; -* Systems Performance Tuning; Gian-Paolo D. Musumeci and others...; O'Reilly +* Effective awk programming; Arnold Robbins; O'Reilly * 97 things every SRE should know; Emil Stolarsky, Jaime Woo; O'Reilly -* Polished Ruby Programming; Jeremy Evans; Packt Publishing -* Higher Order Perl; Mark Dominus; Morgan Kaufmann -* Kubernetes Cookbook; Sameer Naik, Sébastien Goasguen, Jonathan Michaux; O'Reilly -* Go Brain Teasers - Exercise Your Mind; Miki Tebeka; The Pragmatic Programmers -* Systemprogrammierung in Go; Frank Müller; dpunkt -* Data Science at the Command Line; Jeroen Janssens; O'Reilly * Leanring eBPF; Liz Rice; O'Reilly +* Learn You a Haskell for Great Good!; Miran Lipovaca; No Starch Press +* Amazon Web Services in Action; Michael Wittig and Andreas Wittig; Manning Publications +* Pro Puppet; James Turnbull, Jeffrey McCune; Apress +* Go Brain Teasers - Exercise Your Mind; Miki Tebeka; The Pragmatic Programmers +* Programming Ruby 3.3 (5th Edition); Noel Rappin, with Dave Thomas; The Pragmatic Bookshelf +* Clusterbau mit Linux-HA; Michael Schwartzkopff; O'Reilly +* Ultimate Go Notebook; Bill Kennedy +* Site Reliability Engineering; How Google runs production systems; O'Reilly +* The KCNA (Kubernetes and Cloud Native Associate) Book; Nigel Poulton +* Concurrency in Go; Katherine Cox-Buday; O'Reilly +* Perl New Features; Joshua McAdams, brian d foy; Perl School +* Kubernetes Cookbook; Sameer Naik, Sébastien Goasguen, Jonathan Michaux; O'Reilly +* Java ist auch eine Insel; Christian Ullenboom; +* Polished Ruby Programming; Jeremy Evans; Packt Publishing +* The Docker Book; James Turnbull; Kindle * The Kubernetes Book; Nigel Poulton; Unabridged Audiobook +* DevOps And Site Reliability Engineering Handbook; Stephen Fleming; Audible +* Systemprogrammierung in Go; Frank Müller; dpunkt +* The Practise of System and Network Administration; Thomas A. Limoncelli, Christina J. Hogan, Strata R. Chalup; Addison-Wesley Professional Pro Git; Scott Chacon, Ben Straub; Apress +* Higher Order Perl; Mark Dominus; Morgan Kaufmann +* The Pragmatic Programmer; David Thomas; Addison-Wesley +* Chaos Engineering - System Resiliency in Practice; Casey Rosenthal and Nora Jones; eBook +* Seeking SRE: Conversations About Running Production Systems at Scale; David N. Blank-Edelman; eBook * Think Raku (aka Think Perl 6); Laurent Rosenfeld, Allen B. Downey; O'Reilly -* Modern Perl; Chromatic ; Onyx Neon Press -* Ultimate Go Notebook; Bill Kennedy * 100 Go Mistakes and How to Avoid Them; Teiva Harsanyi; Manning Publications -* DevOps And Site Reliability Engineering Handbook; Stephen Fleming; Audible -* Java ist auch eine Insel; Christian Ullenboom; +* Developing Games in Java; David Brackeen and others...; New Riders * The DevOps Handbook; Gene Kim, Jez Humble, Patrick Debois, John Willis; Audible +* Distributed Systems: Principles and Paradigms; Andrew S. Tanenbaum; Pearson +* C++ Programming Language; Bjarne Stroustrup; +* Terraform Cookbook; Mikael Krief; Packt Publishing * Funktionale Programmierung; Peter Pepper; Springer +* Data Science at the Command Line; Jeroen Janssens; O'Reilly +* The Go Programming Language; Alan A. A. Donovan; Addison-Wesley Professional * Tmux 2: Productive Mouse-free Development; Brain P. Hogan; The Pragmatic Programmers -* Programming Ruby 3.3 (5th Edition); Noel Rappin, with Dave Thomas; The Pragmatic Bookshelf -* Learn You a Haskell for Great Good!; Miran Lipovaca; No Starch Press -* Effective awk programming; Arnold Robbins; O'Reilly -* Distributed Systems: Principles and Paradigms; Andrew S. Tanenbaum; Pearson -* Effective Java; Joshua Bloch; Addison-Wesley Professional +* Programming Perl aka "The Camel Book"; Tom Christiansen, brian d foy, Larry Wall & Jon Orwant; O'Reilly +* Systems Performance Tuning; Gian-Paolo D. Musumeci and others...; O'Reilly +* 21st Century C: C Tips from the New School; Ben Klemens; O'Reilly +* Modern Perl; Chromatic ; Onyx Neon Press * Learn You Some Erlang for Great Good; Fred Herbert; No Starch Press -* Chaos Engineering - System Resiliency in Practice; Casey Rosenthal and Nora Jones; eBook * Hands-on Infrastructure Monitoring with Prometheus; Joel Bastos, Pedro Araujo; Packt -* Clusterbau mit Linux-HA; Michael Schwartzkopff; O'Reilly -* Developing Games in Java; David Brackeen and others...; New Riders -* Concurrency in Go; Katherine Cox-Buday; O'Reilly -* Object-Oriented Programming with ANSI-C; Axel-Tobias Schreiner -* The KCNA (Kubernetes and Cloud Native Associate) Book; Nigel Poulton -* Seeking SRE: Conversations About Running Production Systems at Scale; David N. Blank-Edelman; eBook -* The Docker Book; James Turnbull; Kindle -* 21st Century C: C Tips from the New School; Ben Klemens; O'Reilly -* The Pragmatic Programmer; David Thomas; Addison-Wesley +* Raku Recipes; J.J. Merelo; Apress * Raku Fundamentals; Moritz Lenz; Apress -* Programming Perl aka "The Camel Book"; Tom Christiansen, brian d foy, Larry Wall & Jon Orwant; O'Reilly -* Amazon Web Services in Action; Michael Wittig and Andreas Wittig; Manning Publications -* Terraform Cookbook; Mikael Krief; Packt Publishing -* Pro Puppet; James Turnbull, Jeffrey McCune; Apress -* The Go Programming Language; Alan A. A. Donovan; Addison-Wesley Professional -* Perl New Features; Joshua McAdams, brian d foy; Perl School -* The Practise of System and Network Administration; Thomas A. Limoncelli, Christina J. Hogan, Strata R. Chalup; Addison-Wesley Professional Pro Git; Scott Chacon, Ben Straub; Apress +* Object-Oriented Programming with ANSI-C; Axel-Tobias Schreiner +* DNS and BIND; Cricket Liu; O'Reilly +* Effective Java; Joshua Bloch; Addison-Wesley Professional ## Technical references I didn't read them from the beginning to the end, but I am using them to look up things. The books are in random order: -* Implementing Service Level Objectives; Alex Hidalgo; O'Reilly +* The Linux Programming Interface; Michael Kerrisk; No Starch Press * Go: Design Patterns for Real-World Projects; Mat Ryer; Packt -* Relayd and Httpd Mastery; Michael W Lucas -* Groovy Kurz & Gut; Joerg Staudemeier; O'Reilly * Understanding the Linux Kernel; Daniel P. Bovet, Marco Cesati; O'Reilly -* The Linux Programming Interface; Michael Kerrisk; No Starch Press * Algorithms; Robert Sedgewick, Kevin Wayne; Addison Wesley +* Relayd and Httpd Mastery; Michael W Lucas +* Implementing Service Level Objectives; Alex Hidalgo; O'Reilly +* Groovy Kurz & Gut; Joerg Staudemeier; O'Reilly * BPF Performance Tools - Linux System and Application Observability, Brendan Gregg; Addison Wesley ## Self-development and soft-skills books In random order: -* The Bullet Journal Method; Ryder Carroll; Fourth Estate -* Staff Engineer: Leadership beyond the management track; Will Larson; Audiobook -* The Complete Software Developer's Career Guide; John Sonmez; Unabridged Audiobook -* Ultralearning; Scott Young; Thorsons -* Who Moved My Cheese?; Dr. Spencer Johnson; Vermilion -* Slow Productivity; Cal Newport; Penguin Random House -* The Joy of Missing Out; Christina Crook; New Society Publishers -* Ultralearning; Anna Laurent; Self-published via Amazon +* Stop starting, start finishing; Arne Roock; Lean-Kanban University +* Consciousness: A Very Short Introduction; Susan Blackmore; Oxford Uiversity Press +* Eat That Frog; Brian Tracy +* The Software Engineer's Guidebook: Navigating senior, tech lead, and staff engineer positions at tech companies and startups; Gergely Orosz; Audiobook * So Good They Can't Ignore You; Cal Newport; Business Plus -* Deep Work; Cal Newport; Piatkus -* The Courage to Be Disliked; Ichiro Kishimi and Fumitake Koga; Audiobook -* Atomic Habits; James Clear; Random House Business +* Influence without Authority; A. Cohen, D. Bradford; Wiley +* The Phoenix Project - A Novel About IT, DevOps, and Helping your Business Win; Gene Kim and Kevin Behr; Trade Select * Psycho-Cybernetics; Maxwell Maltz; Perigee Books -* Soft Skills; John Sommez; Manning Publications -* Eat That Frog!; Brian Tracy; Hodder Paperbacks -* Never Split the Difference; Chris Voss, Tahl Raz; Random House Business -* Digital Minimalism; Cal Newport; Portofolio Penguin -* The Obstacle Is The Way; Ryan Holiday; Profile Books Ltd -* Buddah and Einstein walk into a Bar; Guy Joseph Ale, Claire Bloom; Blackstone Publishing * Meditation for Mortals, Oliver Burkeman, Audiobook +* Atomic Habits; James Clear; Random House Business +* Getting Things Done; David Allen +* Time Management for System Administrators; Thomas A. Limoncelli; O'Reilly +* Solve for Happy; Mo Gawdat (RE-READ 1ST TIME) * Search Inside Yourself - The Unexpected path to Achieving Success, Happiness (and World Peace); Chade-Meng Tan, Daniel Goleman, Jon Kabat-Zinn; HarperOne -* Influence without Authority; A. Cohen, D. Bradford; Wiley -* The Power of Now; Eckhard Tolle; Yellow Kite -* The Good Enough Job; Simone Stolzoff; Ebury Edge +* Digital Minimalism; Cal Newport; Portofolio Penguin * 101 Essays that change the way you think; Brianna Wiest; Audiobook -* The 7 Habits Of Highly Effective People; Stephen R. Covey; Simon & Schuster UK -* The Off Switch; Mark Cropley; Virgin Books (RE-READ 1ST TIME) -* Solve for Happy; Mo Gawdat (RE-READ 1ST TIME) -* The Phoenix Project - A Novel About IT, DevOps, and Helping your Business Win; Gene Kim and Kevin Behr; Trade Select +* Staff Engineer: Leadership beyond the management track; Will Larson; Audiobook * Coders at Work - Reflections on the craft of programming, Peter Seibel and Mitchell Dorian et al., Audiobook +* The Courage to Be Disliked; Ichiro Kishimi and Fumitake Koga; Audiobook +* Ultralearning; Anna Laurent; Self-published via Amazon +* The Obstacle Is The Way; Ryan Holiday; Profile Books Ltd * 97 Things Every Engineering Manager Should Know; Camille Fournier; Audiobook -* Getting Things Done; David Allen -* Eat That Frog; Brian Tracy -* Time Management for System Administrators; Thomas A. Limoncelli; O'Reilly +* The Good Enough Job; Simone Stolzoff; Ebury Edge +* The Joy of Missing Out; Christina Crook; New Society Publishers +* Never Split the Difference; Chris Voss, Tahl Raz; Random House Business * The Daily Stoic; Ryan Holiday, Stephen Hanselman; Profile Books -* Stop starting, start finishing; Arne Roock; Lean-Kanban University -* The Software Engineer's Guidebook: Navigating senior, tech lead, and staff engineer positions at tech companies and startups; Gergely Orosz; Audiobook -* Consciousness: A Very Short Introduction; Susan Blackmore; Oxford Uiversity Press +* The Bullet Journal Method; Ryder Carroll; Fourth Estate +* The Complete Software Developer's Career Guide; John Sonmez; Unabridged Audiobook +* The Off Switch; Mark Cropley; Virgin Books (RE-READ 1ST TIME) +* Slow Productivity; Cal Newport; Penguin Random House +* The Power of Now; Eckhard Tolle; Yellow Kite +* Eat That Frog!; Brian Tracy; Hodder Paperbacks +* Deep Work; Cal Newport; Piatkus +* Buddah and Einstein walk into a Bar; Guy Joseph Ale, Claire Bloom; Blackstone Publishing +* Soft Skills; John Sommez; Manning Publications +* Who Moved My Cheese?; Dr. Spencer Johnson; Vermilion +* Ultralearning; Scott Young; Thorsons +* The 7 Habits Of Highly Effective People; Stephen R. Covey; Simon & Schuster UK => ../notes/index.gmi Here are notes of mine for some of the books @@ -147,28 +147,28 @@ In random order: Some of these were in-person with exams; others were online learning lectures only. In random order: * Ultimate Go Programming; Bill Kennedy; O'Reilly Online -* Functional programming lecture; Remote University of Hagen -* Structure and Interpretation of Computer Programs; Harold Abelson and more...; -* Cloud Operations on AWS - Learn how to configure, deploy, maintain, and troubleshoot your AWS environments; 3-day online live training with labs; Amazon +* Scripting Vim; Damian Conway; O'Reilly Online * MySQL Deep Dive Workshop; 2-day on-site training -* Apache Tomcat Best Practises; 3-day on-site training -* Protocol buffers; O'Reilly Online -* Linux Security and Isolation APIs Training; Michael Kerrisk; 3-day on-site training +* Developing IaC with Terraform (with Live Lessons); O'Reilly Online * Red Hat Certified System Administrator; Course + certification (Although I had the option, I decided not to take the next course as it is more effective to self learn what I need) +* Cloud Operations on AWS - Learn how to configure, deploy, maintain, and troubleshoot your AWS environments; 3-day online live training with labs; Amazon +* The Ultimate Kubernetes Bootcamp; School of Devops; O'Reilly Online +* Functional programming lecture; Remote University of Hagen * Algorithms Video Lectures; Robert Sedgewick; O'Reilly Online +* AWS Immersion Day; Amazon; 1-day interactive online training +* Apache Tomcat Best Practises; 3-day on-site training +* Structure and Interpretation of Computer Programs; Harold Abelson and more...; +* Protocol buffers; O'Reilly Online * F5 Loadbalancers Training; 2-day on-site training; F5, Inc. -* The Ultimate Kubernetes Bootcamp; School of Devops; O'Reilly Online -* Scripting Vim; Damian Conway; O'Reilly Online +* Linux Security and Isolation APIs Training; Michael Kerrisk; 3-day on-site training * The Well-Grounded Rubyist Video Edition; David. A. Black; O'Reilly Online -* AWS Immersion Day; Amazon; 1-day interactive online training -* Developing IaC with Terraform (with Live Lessons); O'Reilly Online ## Technical guides These are not whole books, but guides (smaller or larger) which I found very useful. in random order: -* Advanced Bash-Scripting Guide * How CPUs work at https://cpu.land +* Advanced Bash-Scripting Guide * Raku Guide at https://raku.guide ## Podcasts @@ -177,57 +177,57 @@ These are not whole books, but guides (smaller or larger) which I found very use In random order: -* Modern Mentor -* BSD Now [BSD] * Hidden Brain +* Cup o' Go [Golang] +* Deep Questions with Cal Newport +* Pratical AI +* The Pragmatic Engineer Podcast +* Wednesday Wisdom * Backend Banter -* The Changelog Podcast(s) +* Modern Mentor * Dev Interrupted -* Wednesday Wisdom -* The Pragmatic Engineer Podcast -* Pratical AI * The ProdCast (Google SRE Podcast) * Fallthrough [Golang] -* Cup o' Go [Golang] * Fork Around And Find Out +* The Changelog Podcast(s) +* BSD Now [BSD] * Maintainable -* Deep Questions with Cal Newport ### Podcasts I liked I liked them but am not listening to them anymore. The podcasts have either "finished" (no more episodes) or I stopped listening to them due to time constraints or a shift in my interests. * Java Pub House -* Go Time (predecessor of fallthrough) * Ship It (predecessor of Fork Around And Find Out) +* Modern Mentor * FLOSS weekly +* Go Time (predecessor of fallthrough) * CRE: Chaosradio Express [german] -* Modern Mentor ## Newsletters I like This is a mix of tech and non-tech newsletters I am subscribed to. In random order: -* The Pragmatic Engineer -* Changelog News -* Golang Weekly -* Andreas Brandhorst Newsletter (Sci-Fi author) -* Ruby Weekly -* Register Spill +* The Valuable Dev * Applied Go Weekly Newsletter * Monospace Mentor -* The Imperfectionist * VK Newsletter -* The Valuable Dev * byteSizeGo +* Andreas Brandhorst Newsletter (Sci-Fi author) +* Register Spill +* Golang Weekly +* Changelog News +* The Pragmatic Engineer +* The Imperfectionist +* Ruby Weekly ## Magazines I like(d) This is a mix of tech I like(d). I may not be a current subscriber, but now and then, I buy an issue. In random order: +* LWN (online only) * Linux Magazine * Linux User -* LWN (online only) * freeX (not published anymore) # Formal education diff --git a/gemfeed/DRAFT-x-rag-observability-hackathon.gmi b/gemfeed/DRAFT-x-rag-observability-hackathon.gmi new file mode 100644 index 00000000..c50ea736 --- /dev/null +++ b/gemfeed/DRAFT-x-rag-observability-hackathon.gmi @@ -0,0 +1,885 @@ +# Adding Observability to X-RAG + +This blog post describes my hackathon efforts adding observability to X-RAG, a distributed Retrieval-Augmented Generation (RAG) platform built by my brother Florian. I especially made time available over the weekend to join his 3-day hackathon (attending 2 days) with the goal of instrumenting his existing distributed system with observability. What started as "let's add some metrics" turned into a comprehensive implementation of the three pillars of observability: tracing, metrics, and logs. + +=> https://github.com/florianbuetow/x-rag X-RAG source code on GitHub + +## Table of Contents + +* ⇢ Adding Observability to X-RAG +* ⇢ ⇢ What is X-RAG? +* ⇢ ⇢ Running Kubernetes locally with Kind +* ⇢ ⇢ Motivation +* ⇢ ⇢ The observability stack +* ⇢ ⇢ Grafana Alloy: the unified collector +* ⇢ ⇢ Centralised logging with Loki +* ⇢ ⇢ ⇢ Alloy configuration for logs +* ⇢ ⇢ ⇢ Querying logs with LogQL +* ⇢ ⇢ Metrics with Prometheus +* ⇢ ⇢ ⇢ Alloy configuration for application metrics +* ⇢ ⇢ ⇢ Kubernetes metrics: kubelet, cAdvisor, and kube-state-metrics +* ⇢ ⇢ ⇢ Infrastructure metrics: Kafka, Redis, MinIO +* ⇢ ⇢ Distributed tracing with Tempo +* ⇢ ⇢ ⇢ Understanding traces, spans, and the trace tree +* ⇢ ⇢ ⇢ How trace context propagates +* ⇢ ⇢ ⇢ Implementation +* ⇢ ⇢ ⇢ Alloy configuration for traces +* ⇢ ⇢ Async ingestion trace walkthrough +* ⇢ ⇢ ⇢ Step 1: Ingest a document +* ⇢ ⇢ ⇢ Step 2: Find the ingestion trace +* ⇢ ⇢ ⇢ Step 3: Fetch the complete trace +* ⇢ ⇢ ⇢ Step 4: Analyse the async trace +* ⇢ ⇢ ⇢ Viewing traces in Grafana +* ⇢ ⇢ End-to-end search trace walkthrough +* ⇢ ⇢ ⇢ Step 1: Make a search request +* ⇢ ⇢ ⇢ Step 2: Query Tempo for the trace +* ⇢ ⇢ ⇢ Step 3: Analyse the trace +* ⇢ ⇢ ⇢ Step 4: Search traces with TraceQL +* ⇢ ⇢ ⇢ Viewing the search trace in Grafana +* ⇢ ⇢ Correlating the three signals +* ⇢ ⇢ Grafana dashboards +* ⇢ ⇢ Results: two days well spent +* ⇢ ⇢ SLIs, SLOs and SLAs +* ⇢ ⇢ Using Amp for AI-assisted development +* ⇢ ⇢ Other changes along the way +* ⇢ ⇢ Lessons learned + +## What is X-RAG? + +X-RAG is a distributed RAG (Retrieval-Augmented Generation) platform running on Kubernetes. The idea behind RAG is simple: instead of asking an LLM to answer questions from its training data alone, you first retrieve relevant documents from your own knowledge base, then feed those documents to the LLM as context. The LLM synthesises an answer grounded in your actual content—reducing hallucinations and enabling answers about private or recent information the model was never trained on. + +X-RAG handles the full pipeline: ingest documents, chunk them into searchable pieces, generate vector embeddings, store them in a vector database, and at query time, retrieve relevant chunks and pass them to an LLM for answer generation. The system supports both local LLMs (Florian runs his on a beefy desktop) and cloud APIs like OpenAI. I configured an OpenAI API key since my laptop's CPU and GPU aren't fast enough for decent local inference. + +All services are implemented in Python. I'm more used to Ruby, Go, and Bash these days, but for this project it didn't matter—Python's OpenTelemetry integration is straightforward, I wasn't planning to write or rewrite tons of application code, and with GenAI assistance the language barrier was a non-issue. The OpenTelemetry concepts and patterns should translate to other languages too—the SDK APIs are intentionally similar across Python, Go, Java, and others. + +X-RAG consists of several independently scalable microservices: + +* Search UI: FastAPI web interface for queries +* Ingestion API: Document upload endpoint +* Embedding Service: gRPC service for vector embeddings +* Indexer: Kafka consumer that processes documents +* Search Service: gRPC service orchestrating the RAG pipeline + +The Embedding Service deserves extra explanation because in the beginning I didn't really knew what it was. Text isn't directly searchable in a vector database—you need to convert it to numerical vectors (embeddings) that capture semantic meaning. The Embedding Service takes text chunks and calls an embedding model (OpenAI's `text-embedding-3-small` in my case, or a local model on Florian's setup) to produce these vectors. For the LLM search completion answer, I used `gpt-4o-mini`. + +Similar concepts end up with similar vectors, so "What is machine learning?" and "Explain ML" produce vectors close together in the embedding space. At query time, your question gets embedded too, and the vector database finds chunks with nearby vectors—that's semantic search. + +The data layer includes Weaviate (vector database with hybrid search), Kafka (message queue), MinIO (object storage), and Redis (cache). All of this runs in a Kind Kubernetes cluster for local development, with the same manifests deployable to production. + +``` +┌─────────────────────────────────────────────────────────────────────────┐ +│ X-RAG Kubernetes Cluster │ +├─────────────────────────────────────────────────────────────────────────┤ +│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ +│ │ Search UI │ │Search Svc │ │Embed Service│ │ Indexer │ │ +│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ +│ │ │ │ │ │ +│ └────────────────┴────────────────┴────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ +│ │ Weaviate │ │ Kafka │ │ MinIO │ │ +│ └─────────────┘ └─────────────┘ └─────────────┘ │ +└─────────────────────────────────────────────────────────────────────────┘ +``` + +## Running Kubernetes locally with Kind + +X-RAG runs on Kubernetes, but you don't need a cloud account to develop it. The project uses Kind (Kubernetes in Docker)—a tool originally created by the Kubernetes SIG for testing Kubernetes itself. + +=> https://kind.sigs.k8s.io/ Kind - Kubernetes in Docker + +Kind spins up a full Kubernetes cluster using Docker containers as nodes. The control plane (API server, etcd, scheduler, controller-manager) runs in one container, and worker nodes run in separate containers. Inside these "node containers," pods run just like they would on real servers—using containerd as the container runtime. It's containers all the way down. + +Technically, each Kind node is a Docker container running a minimal Linux image with kubelet and containerd installed. When you deploy a pod, kubelet inside the node container instructs containerd to pull and run the container image. So you have Docker running node containers, and inside those, containerd running application containers. Network-wise, Kind sets up a Docker bridge network and uses CNI plugins (kindnet by default) for pod networking within the cluster. + +``` +$ docker ps --format "table {{.Names}}\t{{.Image}}" +NAMES IMAGE +xrag-k8-control-plane kindest/node:v1.32.0 +xrag-k8-worker kindest/node:v1.32.0 +xrag-k8-worker2 kindest/node:v1.32.0 +``` + +The `kindest/node` image contains everything needed: kubelet, containerd, CNI plugins, and pre-pulled pause containers. Port mappings in the Kind config expose services to the host—that's how http://localhost:8080 reaches the search-ui running inside a pod, inside a worker container, inside Docker. + +``` +┌─────────────────────────────────────────────────────────────────────────┐ +│ Docker Host │ +├─────────────────────────────────────────────────────────────────────────┤ +│ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ │ +│ │ xrag-k8-control │ │ xrag-k8-worker │ │ xrag-k8-worker2 │ │ +│ │ -plane (container)│ │ (container) │ │ (container) │ │ +│ │ │ │ │ │ │ │ +│ │ K8s API server │ │ Pods: │ │ Pods: │ │ +│ │ etcd, scheduler │ │ • search-ui │ │ • weaviate │ │ +│ │ │ │ • search-service │ │ • kafka │ │ +│ │ │ │ • embedding-svc │ │ • prometheus │ │ +│ │ │ │ • indexer │ │ • grafana │ │ +│ └───────────────────┘ └───────────────────┘ └───────────────────┘ │ +└─────────────────────────────────────────────────────────────────────────┘ +``` + +Why Kind? It gives you a real Kubernetes environment—the same manifests deploy to production clouds unchanged. No minikube quirks, no Docker Compose translation layer. Just Kubernetes. I already have a k3s cluster running at home, but Kind made collaboration easier—everyone working on X-RAG gets the exact same setup by cloning the repo and running `make cluster-start`. + +Florian developed X-RAG on macOS, but it worked seamlessly on my Linux laptop. The only difference was Docker's resource allocation: on macOS you configure limits in Docker Desktop, on Linux it uses host resources directly. That's because under macOS the Linux Docker containers run on an emulation layer as macOS is not Linux. + +My hardware: a ThinkPad X1 Carbon Gen 9 with an 11th Gen Intel Core i7-1185G7 (4 cores, 8 threads at 3.00GHz) and 32GB RAM (running Fedora Linux). During the hackathon, memory usage peaked around 15GB—comfortable headroom. CPU was the bottleneck; with ~38 pods running across all namespaces (rag-system, monitoring, kube-system, etc.), plus Discord for the remote video call and Tidal streaming hi-res music, things got tight. When rebuilding Docker images or restarting the cluster, Discord video and audio would stutter—my fellow hackers probably wondered why I kept freezing mid-sentence. A beefier CPU would have meant less waiting and smoother calls, but it was manageable. + +## Motivation + +When I joined the hackathon, Florian's X-RAG was functional but opaque. With five services communicating via gRPC, Kafka, and HTTP, debugging was cumbersome. When a search request take 5 seconds, there was no visibility into where the time was being spent. Was it the embedding generation? The vector search? The LLM synthesis? Nobody would be able to figure it out quickly. + +Distributed systems are inherently opaque. Each service logs its own view of the world, but correlating events across service boundaries is archaeology. Grepping through logs on many pods, trying to mentally reconstruct what happened—not fun. This was the perfect hackathon project: Explore this Observability Stack in greater depth. + +## The observability stack + +Before diving into implementation, here's what I deployed. The complete stack runs in the monitoring namespace: + +``` +$ kubectl get pods -n monitoring +NAME READY STATUS +alloy-84ddf4cd8c-7phjp 1/1 Running +grafana-6fcc89b4d6-pnh8l 1/1 Running +kube-state-metrics-5d954c569f-2r45n 1/1 Running +loki-8c9bbf744-sc2p5 1/1 Running +node-exporter-kb8zz 1/1 Running +node-exporter-zcrdz 1/1 Running +node-exporter-zmskc 1/1 Running +prometheus-7f755f675-dqcht 1/1 Running +tempo-55df7dbcdd-t8fg9 1/1 Running +``` + +Each component has a specific role: + +* `Grafana Alloy`: The unified collector. Receives OTLP from applications, scrapes Prometheus endpoints, tails log files. Think of it as the central nervous system. +* `Prometheus`: Time-series database for metrics. Stores counters, gauges, and histograms with 15-day retention. +* `Tempo`: Trace storage. Receives spans via OTLP, correlates them by trace ID, enables TraceQL queries. +* `Loki`: Log aggregation. Indexes labels (namespace, pod, container), stores log chunks, enables LogQL queries. +* `Grafana`: The unified UI. Queries all three backends, correlates signals, displays dashboards. +* `kube-state-metrics`: Exposes Kubernetes object metrics (pod status, deployments, resource requests). +* `node-exporter`: Exposes host-level metrics (CPU, memory, disk, network) from each Kubernetes node. + +Everything is accessible via port-forwards: + +* Grafana: http://localhost:3000 (unified UI for all three signals) +* Prometheus: http://localhost:9090 (metrics queries) +* Tempo: http://localhost:3200 (trace queries) +* Loki: http://localhost:3100 (log queries) + +## Grafana Alloy: the unified collector + +Before diving into the individual signals, I want to highlight Grafana Alloy—the component that ties everything together. Alloy is Grafana's vendor-neutral OpenTelemetry Collector distribution, and it became the backbone of the observability stack. + +=> https://grafana.com/docs/alloy/latest/ Grafana Alloy documentation + +Why use a centralised collector instead of having each service push directly to backends? + +* `Decoupling`: Applications don't need to know about Prometheus, Tempo, or Loki. They speak OTLP, and Alloy handles the translation. +* `Unified timestamps`: All telemetry flows through one system, making correlation in Grafana more reliable. +* `Processing pipeline`: Batch data before sending, filter noisy metrics, enrich with labels—all in one place. +* `Backend flexibility`: Switch from Tempo to Jaeger without changing application code. + +Alloy uses a configuration language called River, which feels similar to Terraform's HCL—declarative blocks with attributes. If you've written Terraform, River will look familiar. The full Alloy configuration runs to over 1400 lines with comments explaining each section. It handles OTLP receiving, batch processing, Prometheus export, Tempo export, Kubernetes metrics scraping, infrastructure metrics, and pod log collection. All three signals—metrics, traces, logs—flow through this single component, making Alloy the central nervous system of the observability stack. + +In the following sections, I'll cover each observability pillar and show the relevant Alloy configuration for each. + +## Centralised logging with Loki + +Getting all logs in one place was the foundation. I deployed Grafana Loki in the monitoring namespace, with Grafana Alloy running as a DaemonSet on each node to collect logs. + +``` +┌──────────────────────────────────────────────────────────────────────┐ +│ LOGS PIPELINE │ +├──────────────────────────────────────────────────────────────────────┤ +│ Applications write to stdout → containerd stores in /var/log/pods │ +│ │ │ +│ File tail │ +│ ▼ │ +│ Grafana Alloy (DaemonSet) │ +│ Discovers pods, extracts metadata │ +│ │ │ +│ HTTP POST /loki/api/v1/push │ +│ ▼ │ +│ Grafana Loki │ +│ Indexes labels, stores chunks │ +└──────────────────────────────────────────────────────────────────────┘ +``` + +### Alloy configuration for logs + +Alloy discovers pods via the Kubernetes API, tails their log files from /var/log/pods/, and ships to Loki. Importantly, Alloy runs as a DaemonSet on each worker node—it doesn't run inside the application pods. Since containerd writes all container stdout/stderr to /var/log/pods/ on the node's filesystem, Alloy can tail logs for every pod on that node from a single location without any sidecar injection: + +``` +loki.source.kubernetes "pod_logs" { + targets = discovery.relabel.pod_logs.output + forward_to = [loki.process.pod_logs.receiver] +} + +loki.write "default" { + endpoint { + url = "http://loki.monitoring.svc.cluster.local:3100/loki/api/v1/push" + } +} +``` + +### Querying logs with LogQL + +Now I could query logs in Loki (e.g. via Grafana UI) with LogQL: + +``` +{namespace="rag-system", container="search-ui"} |= "ERROR" +``` + +## Metrics with Prometheus + +I added Prometheus metrics to every service. Following the Four Golden Signals (latency, traffic, errors, saturation), I instrumented the codebase with histograms, counters, and gauges: + +```python +from prometheus_client import Histogram, Counter, Gauge + +search_duration = Histogram( + "search_service_request_duration_seconds", + "Total duration of Search Service requests", + ["method"], + buckets=[0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, 20.0, 30.0, 60.0], +) + +errors_total = Counter( + "search_service_errors_total", + "Error count by type", + ["method", "error_type"], +) +``` + +Initially, I used Prometheus scraping—each service exposed a /metrics endpoint, and Prometheus pulled metrics every 15 seconds. This worked, but I wanted a unified pipeline. + +### Alloy configuration for application metrics + +The breakthrough came with Grafana Alloy as an OpenTelemetry collector. Services now push metrics via OTLP (OpenTelemetry Protocol), and Alloy converts them to Prometheus format: + +``` +┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ +│ search-ui │ │search-svc │ │embed-svc │ │ indexer │ +│ OTel Meter │ │ OTel Meter │ │ OTel Meter │ │ OTel Meter │ +│ │ │ │ │ │ │ │ │ │ │ │ +│ OTLPExporter│ │ OTLPExporter│ │ OTLPExporter│ │ OTLPExporter│ +└──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ + │ │ │ │ + └────────────────┴────────────────┴────────────────┘ + │ + ▼ OTLP/gRPC (port 4317) + ┌─────────────────────┐ + │ Grafana Alloy │ + └──────────┬──────────┘ + │ prometheus.remote_write + ▼ + ┌─────────────────────┐ + │ Prometheus │ + └─────────────────────┘ +``` + +Alloy receives OTLP on ports 4317 (gRPC) or 4318 (HTTP), batches the data for efficiency, and exports to Prometheus: + +``` +otelcol.receiver.otlp "default" { + grpc { endpoint = "0.0.0.0:4317" } + http { endpoint = "0.0.0.0:4318" } + output { + metrics = [otelcol.processor.batch.metrics.input] + traces = [otelcol.processor.batch.traces.input] + } +} + +otelcol.processor.batch "metrics" { + timeout = "5s" + send_batch_size = 1000 + output { metrics = [otelcol.exporter.prometheus.default.input] } +} + +otelcol.exporter.prometheus "default" { + forward_to = [prometheus.remote_write.prom.receiver] +} +``` + +Instead of sending each metric individually, Alloy accumulates up to 1000 metrics (or waits 5 seconds) before flushing. This reduces network overhead and protects backends from being overwhelmed. + +### Kubernetes metrics: kubelet, cAdvisor, and kube-state-metrics + +Alloy also pulls metrics from Kubernetes itself—kubelet resource metrics, cAdvisor container metrics, and kube-state-metrics for cluster state. + +Why three separate sources? It does feel fragmented, but each serves a distinct purpose. `kubelet` exposes resource metrics about pod CPU and memory usage from its own bookkeeping—lightweight summaries of what's running on each node. `cAdvisor` (Container Advisor) runs inside kubelet and provides detailed container-level metrics: CPU throttling, memory working sets, filesystem I/O, network bytes. These are the raw runtime stats from containerd. `kube-state-metrics` is different—it doesn't measure resource usage at all. Instead, it queries the Kubernetes API and exposes the *desired state*: how many replicas a Deployment wants, whether a Pod is pending or running, what resource requests and limits are configured. You need all three because "container used 500MB" (cAdvisor), "pod requested 1GB" (kube-state-metrics), and "node has 4GB available" (kubelet) are complementary views. The fragmentation is a consequence of Kubernetes' architecture—no single component has the complete picture. + +None of these components speak OpenTelemetry—they all expose Prometheus-format metrics via HTTP endpoints. That's why Alloy uses `prometheus.scrape` instead of receiving OTLP pushes. Alloy handles both worlds: OTLP from our applications, Prometheus scraping for infrastructure. + +``` +prometheus.scrape "kubelet_resource" { + targets = discovery.relabel.kubelet.output + job_name = "kubelet-resource" + scheme = "https" + scrape_interval = "30s" + bearer_token_file = "/var/run/secrets/kubernetes.io/serviceaccount/token" + tls_config { insecure_skip_verify = true } + forward_to = [prometheus.remote_write.prom.receiver] +} + +prometheus.scrape "cadvisor" { + targets = discovery.relabel.cadvisor.output + job_name = "cadvisor" + scheme = "https" + scrape_interval = "60s" + bearer_token_file = "/var/run/secrets/kubernetes.io/serviceaccount/token" + tls_config { insecure_skip_verify = true } + forward_to = [prometheus.relabel.cadvisor_filter.receiver] +} + +prometheus.scrape "kube_state_metrics" { + targets = [ + {"__address__" = "kube-state-metrics.monitoring.svc.cluster.local:8080"}, + ] + job_name = "kube-state-metrics" + scrape_interval = "30s" + forward_to = [prometheus.relabel.kube_state_filter.receiver] +} +``` + +Note that `kubelet` and `cAdvisor` require HTTPS with bearer token authentication (using the service account token mounted by Kubernetes), while `kube-state-metrics` is a simple HTTP target. `cAdvisor` is scraped less frequently (60s) because it returns many more metrics with higher cardinality. + +### Infrastructure metrics: Kafka, Redis, MinIO + +Application metrics weren't enough. I also needed visibility into the data layer. Each infrastructure component has a specific role in X-RAG and got its own exporter: + +`Redis` is the caching layer. It stores search results and embeddings to avoid redundant API calls to OpenAI. We collect 25 metrics via oliver006/redis_exporter running as a sidecar, including cache hit/miss rates, memory usage, connected clients, and command latencies. The key metric? `redis_keyspace_hits_total / (redis_keyspace_hits_total + redis_keyspace_misses_total)` tells you if caching is actually helping. + +`Kafka` is the message queue connecting the ingestion API to the indexer. Documents are published to a topic, and the indexer consumes them asynchronously. We collect 12 metrics via danielqsj/kafka-exporter, with consumer lag being the most critical—it shows how far behind the indexer is. High lag means documents aren't being indexed fast enough. + +`MinIO` is the S3-compatible object storage where raw documents are stored before processing. We collect 16 metrics from its native /minio/v2/metrics/cluster endpoint, covering request rates, error counts, storage usage, and cluster health. + +You can verify these counts by querying Prometheus directly: + +``` +$ curl -s 'http://localhost:9090/api/v1/label/__name__/values' \ + | jq -r '.data[]' | grep -c '^redis_' +25 +$ curl -s 'http://localhost:9090/api/v1/label/__name__/values' \ + | jq -r '.data[]' | grep -c '^kafka_' +12 +$ curl -s 'http://localhost:9090/api/v1/label/__name__/values' \ + | jq -r '.data[]' | grep -c '^minio_' +16 +``` + +=> https://github.com/florianbuetow/x-rag/blob/main/infra/k8s/monitoring/alloy-config.yaml Full Alloy configuration with detailed metric filtering + +Alloy scrapes all of these and remote-writes to Prometheus: + +``` +prometheus.scrape "redis_exporter" { + targets = [ + {"__address__" = "xrag-redis.rag-system.svc.cluster.local:9121"}, + ] + job_name = "redis" + scrape_interval = "30s" + forward_to = [prometheus.relabel.redis_filter.receiver] +} + +prometheus.scrape "kafka_exporter" { + targets = [ + {"__address__" = "kafka-exporter.rag-system.svc.cluster.local:9308"}, + ] + job_name = "kafka" + scrape_interval = "30s" + forward_to = [prometheus.relabel.kafka_filter.receiver] +} + +prometheus.scrape "minio" { + targets = [ + {"__address__" = "xrag-minio.rag-system.svc.cluster.local:9000"}, + ] + job_name = "minio" + metrics_path = "/minio/v2/metrics/cluster" + scrape_interval = "30s" + forward_to = [prometheus.relabel.minio_filter.receiver] +} +``` + +Note that MinIO exposes metrics at a custom path (`/minio/v2/metrics/cluster`) rather than the default `/metrics`. Each exporter forwards to a relabel component that filters down to essential metrics before sending to Prometheus. + +With all metrics in Prometheus, I can use PromQL queries in Grafana dashboards. For example, to check Kafka consumer lag and see if the indexer is falling behind: + +```promql +sum by (consumergroup, topic) (kafka_consumergroup_lag) +``` + +Or check Redis cache effectiveness: + +```promql +redis_keyspace_hits_total / (redis_keyspace_hits_total + redis_keyspace_misses_total) +``` + +## Distributed tracing with Tempo + +### Understanding traces, spans, and the trace tree + +Before diving into the implementation, let me explain the core concepts I learned. A `trace` represents a single request's journey through the entire distributed system. Think of it as a receipt that follows your request from the moment it enters the system until the final response. + +Each trace is identified by a `trace ID`—a 128-bit identifier (32 hex characters) that stays constant across all services. When I make a search request, every service handling that request uses the same trace ID: `9df981cac91857b228eca42b501c98c6`. + +=> https://www.youtube.com/watch?v=KPGjqus5qFo Quick video explaining the difference between trace IDs and span IDs in OpenTelemetry + +Within a trace, individual operations are recorded as `spans`. A span has: + +* A `span ID`: 64-bit identifier (16 hex characters) unique to this operation +* A `parent span ID`: links this span to its caller +* A `name`: what operation this represents (e.g., "POST /api/search") +* `Start time` and `duration` +* `Attributes`: key-value metadata (e.g., `http.status_code=200`) + +The first span in a trace is the `root span`—it has no parent. When the root span calls another service, that service creates a `child span` with the root's span ID as its parent. This parent-child relationship forms a `tree structure`: + +``` + ┌─────────────────────────┐ + │ Root Span │ + │ POST /api/search │ + │ span_id: a1b2c3d4... │ + │ parent: (none) │ + └───────────┬─────────────┘ + │ + ┌─────────────────────┴─────────────────────┐ + │ │ + ▼ ▼ +┌─────────────────────────┐ ┌─────────────────────────┐ +│ Child Span │ │ Child Span │ +│ gRPC Search │ │ render_template │ +│ span_id: e5f6g7h8... │ │ span_id: i9j0k1l2... │ +│ parent: a1b2c3d4... │ │ parent: a1b2c3d4... │ +└───────────┬─────────────┘ └─────────────────────────┘ + │ + ├──────────────────┬──────────────────┐ + ▼ ▼ ▼ + ┌────────────┐ ┌────────────┐ ┌────────────┐ + │ Grandchild │ │ Grandchild │ │ Grandchild │ + │ embedding │ │ vector │ │ llm.rag │ + │ .generate │ │ _search │ │ _completion│ + └────────────┘ └────────────┘ └────────────┘ +``` + +This tree structure answers the critical question: "What called what?" When I see a slow span, I can trace up to see what triggered it and down to see what it's waiting on. + +### How trace context propagates + +The magic that links spans across services is `trace context propagation`. When Service A calls Service B, it must pass along the trace ID and its own span ID (which becomes the parent). OpenTelemetry uses the W3C `traceparent` header: + +``` +traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01 + │ │ │ │ + │ │ │ └── flags + │ │ └── parent span ID (16 hex) + │ └── trace ID (32 hex) + └── version +``` + +For HTTP, this travels as a request header. For gRPC, it's passed as metadata. For Kafka, it's embedded in message headers. The receiving service extracts this context, creates a new span with the propagated trace ID and the caller's span ID as parent, then continues the chain. + +This is why all my spans link together—OpenTelemetry's auto-instrumentation handles propagation automatically for HTTP, gRPC, and Kafka clients. + +### Implementation + +This is where distributed tracing made the difference. I integrated OpenTelemetry auto-instrumentation for FastAPI, gRPC, and HTTP clients, plus manual spans for RAG-specific operations: + +```python +from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor +from opentelemetry.instrumentation.grpc import GrpcAioInstrumentorClient + +# Auto-instrument frameworks +FastAPIInstrumentor.instrument_app(app) +GrpcAioInstrumentorClient().instrument() + +# Manual spans for custom operations +with tracer.start_as_current_span("llm.rag_completion") as span: + span.set_attribute("llm.model", model_name) + result = await generate_answer(query, context) +``` + +`Auto-instrumentation` is the quick win: one line of code and you get spans for every HTTP request, gRPC call, or database query. The instrumentor patches the framework at runtime, so existing code works without modification. The downside? You only get what the library authors decided to capture—generic HTTP attributes like `http.method` and `http.status_code`, but nothing domain-specific. Auto-instrumented spans also can't know your business logic, so a slow request shows up as "POST /api/search took 5 seconds" without revealing which internal operation caused the delay. + +`Manual spans` fill that gap. By wrapping specific operations (like `llm.rag_completion` or `vector_search.query`), you get visibility into your application's unique behaviour. You can add custom attributes (`llm.model`, `query.top_k`, `cache.hit`) that make traces actually useful for debugging. The downside is maintenance: manual spans are code you write and maintain, and you need to decide where instrumentation adds value versus where it just adds noise. In practice, I found the right balance was auto-instrumentation for framework boundaries (HTTP, gRPC) plus manual spans for the 5-10 operations that actually matter for understanding performance. + +The magic is trace context propagation. When the Search UI calls the Search Service via gRPC, the trace ID travels in metadata headers: + +``` +Metadata: [ + ("traceparent", "00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01"), + ("content-type", "application/grpc"), +] +``` + +Spans from all services are linked by this trace ID, forming a tree: + +``` +Trace ID: 0af7651916cd43dd8448eb211c80319c + +├─ [search-ui] POST /api/search (300ms) +│ │ +│ ├─ [search-service] Search (gRPC server) (275ms) +│ │ │ +│ │ ├─ [search-service] embedding.generate (50ms) +│ │ │ └─ [embedding-service] Embed (45ms) +│ │ │ └─ POST https://api.openai.com (35ms) +│ │ │ +│ │ ├─ [search-service] vector_search.query (100ms) +│ │ │ +│ │ └─ [search-service] llm.rag_completion (120ms) +│ └─ openai.chat (115ms) +``` + +### Alloy configuration for traces + +Traces are collected by Alloy and stored in Grafana Tempo. Alloy batches traces for efficiency before exporting via OTLP: + +``` +otelcol.processor.batch "traces" { + timeout = "5s" + send_batch_size = 500 + output { traces = [otelcol.exporter.otlp.tempo.input] } +} + +otelcol.exporter.otlp "tempo" { + client { + endpoint = "tempo.monitoring.svc.cluster.local:4317" + tls { insecure = true } + } +} +``` + +In Tempo's UI, I can finally see exactly where time is spent. That 5-second query? Turns out the vector search was waiting on a cold Weaviate connection. Now I knew what to fix. + +## Async ingestion trace walkthrough + +One of the most powerful aspects of distributed tracing is following requests across async boundaries like message queues. The document ingestion pipeline flows through Kafka, creating spans that are linked even though they execute in different processes at different times. + +### Step 1: Ingest a document + +``` +$ curl -s -X POST http://localhost:8082/ingest \ + -H "Content-Type: application/json" \ + -d '{ + "text": "This is the X-RAG Observability Guide...", + "metadata": { + "title": "X-RAG Observability Guide", + "source_file": "docs/OBSERVABILITY.md", + "type": "markdown" + }, + "namespace": "default" + }' | jq . +{ + "document_id": "8538656a-ba99-406c-8da7-87c5f0dda34d", + "status": "accepted", + "minio_bucket": "documents", + "minio_key": "8538656a-ba99-406c-8da7-87c5f0dda34d.json", + "message": "Document accepted for processing" +} +``` + +The ingestion API immediately returns—it doesn't wait for indexing. The document is stored in MinIO and a message is published to Kafka. + +### Step 2: Find the ingestion trace + +Using Tempo's HTTP API (port 3200), we can search for traces by span name using TraceQL: + +``` +$ curl -s -G "http://localhost:3200/api/search" \ + --data-urlencode 'q={name="POST /ingest"}' \ + --data-urlencode 'limit=3' | jq '.traces[0].traceID' +"b3fc896a1cf32b425b8e8c46c86c76f7" +``` + +### Step 3: Fetch the complete trace + +``` +$ curl -s "http://localhost:3200/api/traces/b3fc896a1cf32b425b8e8c46c86c76f7" \ + | jq '[.batches[] | ... | {service, span}] | unique' +[ + { "service": "ingestion-api", "span": "POST /ingest" }, + { "service": "ingestion-api", "span": "storage.upload" }, + { "service": "ingestion-api", "span": "messaging.publish" }, + { "service": "indexer", "span": "indexer.process_document" }, + { "service": "indexer", "span": "document.duplicate_check" }, + { "service": "indexer", "span": "document.pipeline" }, + { "service": "indexer", "span": "storage.download" }, + { "service": "indexer", "span": "/xrag.embedding.EmbeddingService/EmbedBatch" }, + { "service": "embedding-service", "span": "openai.embeddings" }, + { "service": "indexer", "span": "db.insert" } +] +``` + +The trace spans `three services`: ingestion-api, indexer, and embedding-service. The trace context propagates through Kafka, linking the original HTTP request to the async consumer processing. + +### Step 4: Analyse the async trace + +``` +ingestion-api | POST /ingest | 16ms ← HTTP response returns +ingestion-api | storage.upload | 13ms ← Save to MinIO +ingestion-api | messaging.publish | 1ms ← Publish to Kafka + | | + | ~~~ Kafka queue ~~~ | ← Async boundary + | | +indexer | indexer.process_document | 1799ms ← Consumer picks up message +indexer | document.duplicate_check | 1ms +indexer | document.pipeline | 1796ms +indexer | storage.download | 1ms ← Fetch from MinIO +indexer | EmbedBatch (gRPC) | 754ms ← Call embedding service +embedding-svc | openai.embeddings | 752ms ← OpenAI API +indexer | db.insert | 1038ms ← Store in Weaviate +``` + +The total async processing takes ~1.8 seconds, but the user sees a 16ms response. Without tracing, debugging "why isn't my document showing up in search results?" would require correlating logs from three services manually. + +`Key insight`: The trace context propagates through Kafka message headers, allowing the indexer's spans to link back to the original ingestion request. This is configured via OpenTelemetry's Kafka instrumentation. + +### Viewing traces in Grafana + +To view a trace in Grafana's UI: + +1. Open Grafana at http://localhost:3000/explore +2. Select `Tempo` as the data source (top-left dropdown) +3. Choose `TraceQL` as the query type +4. Paste the trace ID: `b3fc896a1cf32b425b8e8c46c86c76f7` +5. Click `Run query` + +The trace viewer shows a Gantt chart with all spans, their timing, and parent-child relationships. Click any span to see its attributes. + +=> ./x-rag-observability-hackathon/index-trace.png Async ingestion trace in Grafana Tempo + +=> ./x-rag-observability-hackathon/index-node-graph.png Ingestion trace node graph showing service dependencies + +## End-to-end search trace walkthrough + +To demonstrate the observability stack in action, here's a complete trace from a search request through all services. + +### Step 1: Make a search request + +Normally you'd use the Search UI web interface at http://localhost:8080, but for demonstration purposes curl makes it easier to show the raw request and response: + +``` +$ curl -s -X POST http://localhost:8080/api/search \ + -H "Content-Type: application/json" \ + -d '{"query": "What is RAG?", "namespace": "default", "mode": "hybrid", "top_k": 5}' | jq . +{ + "answer": "I don't have enough information to answer this question.", + "sources": [ + { + "id": "71adbc34-56c1-4f75-9248-4ed38094ac69", + "content": "# X-RAG Observability Guide This document describes...", + "score": 0.8292956352233887, + "metadata": { + "source": "docs/OBSERVABILITY.md", + "type": "markdown", + "namespace": "default" + } + } + ], + "metadata": { + "namespace": "default", + "num_sources": "5", + "cache_hit": "False", + "mode": "hybrid", + "top_k": "5", + "trace_id": "9df981cac91857b228eca42b501c98c6" + } +} +``` + +The response includes a `trace_id` that links this request to all spans across services. + +### Step 2: Query Tempo for the trace + +Using the trace ID from the response, query Tempo's API: + +``` +$ curl -s "http://localhost:3200/api/traces/9df981cac91857b228eca42b501c98c6" \ + | jq '.batches[].scopeSpans[].spans[] + | {name, service: .attributes[] + | select(.key=="service.name") + | .value.stringValue}' +``` + +The raw trace shows spans from multiple services: + +* `search-ui`: `POST /api/search` (root span, 2138ms total) +* `search-ui`: `/xrag.search.SearchService/Search` (gRPC client call) +* `search-service`: `/xrag.search.SearchService/Search` (gRPC server) +* `search-service`: `/xrag.embedding.EmbeddingService/Embed` (gRPC client) +* `embedding-service`: `/xrag.embedding.EmbeddingService/Embed` (gRPC server) +* `embedding-service`: `openai.embeddings` (OpenAI API call, 647ms) +* `embedding-service`: `POST https://api.openai.com/v1/embeddings` (HTTP client) +* `search-service`: `vector_search.query` (Weaviate hybrid search, 13ms) +* `search-service`: `openai.chat` (LLM answer generation, 1468ms) +* `search-service`: `POST https://api.openai.com/v1/chat/completions` (HTTP client) + +### Step 3: Analyse the trace + +From this single trace, I can see exactly where time is spent: + +``` +Total request: 2138ms +├── gRPC to search-service: 2135ms +│ ├── Embedding generation: 649ms +│ │ └── OpenAI embeddings API: 640ms +│ ├── Vector search (Weaviate): 13ms +│ └── LLM answer generation: 1468ms +│ └── OpenAI chat API: 1463ms +``` + +The bottleneck is clear: `68% of time is spent in LLM answer generation`. The vector search (13ms) and embedding generation (649ms) are relatively fast. Without tracing, I would have guessed the embedding service was slow—traces proved otherwise. + +### Step 4: Search traces with TraceQL + +Tempo supports TraceQL for querying traces by attributes: + +``` +$ curl -s -G "http://localhost:3200/api/search" \ + --data-urlencode 'q={resource.service.name="search-service"}' \ + --data-urlencode 'limit=5' | jq '.traces[:2] | .[].rootTraceName' +"/xrag.search.SearchService/Search" +"GET /health/ready" +``` + +Other useful TraceQL queries: + +``` +# Find slow searches (> 2 seconds) +{resource.service.name="search-ui" && name="POST /api/search"} | duration > 2s + +# Find errors +{status=error} + +# Find OpenAI calls +{name=~"openai.*"} +``` + +### Viewing the search trace in Grafana + +Follow the same steps as above, but use the search trace ID: `9df981cac91857b228eca42b501c98c6` + +=> ./x-rag-observability-hackathon/search-trace.png Search trace in Grafana Tempo + +=> ./x-rag-observability-hackathon/search-node-graph.png Search trace node graph showing service flow + +## Correlating the three signals + +The real power comes from correlating traces, metrics, and logs. When an alert fires for high error rate, I follow this workflow: + +1. Metrics: Prometheus shows error spike started at 10:23:00 +2. Traces: Query Tempo for traces with status=error around that time +3. Logs: Use the trace ID to find detailed error messages in Loki + +``` +{namespace="rag-system"} |= "trace_id=abc123" |= "error" +``` + +Prometheus exemplars link specific metric samples to trace IDs, so I can click directly from a latency spike to the responsible trace. + +## Grafana dashboards + +During the hackathon, I also created six pre-built Grafana dashboards that are automatically provisioned when the monitoring stack starts: + +| Dashboard | Description | +|-----------|-------------| +| **X-RAG Overview** | The main dashboard with 22 panels covering request rates, latencies, error rates, and service health across all X-RAG components | +| **OpenTelemetry HTTP Metrics** | HTTP request/response metrics from OpenTelemetry-instrumented services—request rates, latency percentiles, and status code breakdowns | +| **Pod System Metrics** | Kubernetes pod resource utilisation: CPU usage, memory consumption, network I/O, disk I/O, and pod state from kube-state-metrics | +| **Redis** | Cache performance: memory usage, hit/miss rates, commands per second, connected clients, and memory fragmentation | +| **Kafka** | Message queue health: consumer lag (critical for indexer monitoring), broker status, topic partitions, and throughput | +| **MinIO** | Object storage metrics: S3 request rates, error counts, traffic volume, bucket sizes, and disk usage | + +All dashboards are stored as JSON files in `infra/k8s/monitoring/grafana-dashboards/` and deployed via ConfigMaps, so they survive pod restarts and cluster recreations. + +=> ./x-rag-observability-hackathon/dashboard-xrag-overview.png X-RAG Overview dashboard +=> ./x-rag-observability-hackathon/dashboard-pod-system-metrics.png Pod System Metrics dashboard + +## Results: two days well spent + +What did two days of hackathon work achieve? The system went from flying blind to fully instrumented: + +* All three pillars implemented: logs (Loki), metrics (Prometheus), traces (Tempo) +* Unified collection via Grafana Alloy +* Infrastructure metrics for Kafka, Redis, and MinIO +* Six pre-built Grafana dashboards covering application metrics, pod resources, and infrastructure +* Trace context propagation across all gRPC calls + +The biggest insight from testing? The embedding service wasn't the bottleneck I assumed. Traces revealed that LLM synthesis dominated latency, not embedding generation. Without tracing, optimisation efforts would have targeted the wrong component. + +Beyond the technical wins, I had a lot of fun. The hackathon brought together people working on different projects, and I got to know some really nice folks during the sessions themselves. There's something energising about being in a (virtual) room with other people all heads-down on their own challenges—even if you're not collaborating directly, the shared focus is motivating. + +## SLIs, SLOs and SLAs + +The system now has full observability, but there's always more. And to be clear: this is not production-grade yet. It works well for development and could scale to production, but that would need to be validated with proper load testing and chaos testing first. We haven't stress-tested the observability pipeline under heavy load, nor have we tested failure scenarios like Tempo going down or Alloy running out of memory. The Alloy config includes comments on sampling strategies and rate limiting that would be essential for high-traffic environments. + +One thing we didn't cover: monitoring and alerting. These are related but distinct from observability. Observability is about collecting and exploring data to understand system behaviour. Monitoring is about defining thresholds and alerting when they're breached. We have Prometheus with all the metrics, but no alerting rules yet—no PagerDuty integration, no Slack notifications when latency spikes or error rates climb. + +We also didn't define any SLIs (Service Level Indicators) or SLOs (Service Level Objectives). An SLI is a quantitative measure of service quality—for example, "99th percentile search latency" or "percentage of requests returning successfully." An SLO is a target for that indicator—"99th percentile latency should be under 2 seconds" or "99.9% of requests should succeed." Without SLOs, you don't know what "good" looks like, and alerting becomes arbitrary. + +For X-RAG specifically, potential SLOs might include: + +* `Search latency`: 99th percentile search response time under 3 seconds +* `Uptime`: 99.9% availability of the search API endpoint +* `Response quality`: Percentage of searches returning relevant results (though this is harder to measure automatically and might require user feedback or evaluation frameworks) + +SLAs (Service Level Agreements) are often confused with SLOs, but they're different. An SLA is a contractual commitment to customers—a legally binding promise with consequences (refunds, credits, penalties) if you fail to meet it. SLOs are internal engineering targets; SLAs are external business promises. Typically, SLAs are less strict than SLOs: if your internal target is 99.9% availability (SLO), your customer contract might promise 99.5% (SLA), giving you a buffer before you owe anyone money. + +But then again, X-RAG is a proof-of-concept, a prototype, a learning system—there are no real customers to disappoint. SLOs would become essential if this ever served actual users, and SLAs would follow once there's a business relationship to protect. + +## Using Amp for AI-assisted development + +I used Amp (formerly Ampcode) throughout this project. While I knew what I wanted to achieve, I let the LLM generate the actual configurations, Kubernetes manifests, and Python instrumentation code. + +=> https://ampcode.com/ Amp - AI coding agent by Sourcegraph + +My workflow was step-by-step rather than handing over a grand plan: + +1. "Deploy Grafana Alloy to the monitoring namespace" +2. "Verify Alloy is running and receiving data" +3. "Document what we did to docs/OBSERVABILITY.md" +4. "Commit with message 'feat: add Grafana Alloy for telemetry collection'" +5. Hand off context, start fresh: "Now instrument the search-ui with OpenTelemetry to push traces to Alloy..." + +Chaining many small, focused tasks worked better than one massive plan. Each task had clear success criteria, and I could verify results before moving on. The LLM generated the River configuration, the OpenTelemetry Python code, the Kubernetes manifests—I reviewed, tweaked, and committed. + +I only ran out of the 200k token context window once, during a debugging session that involved restarting the Kubernetes cluster multiple times. The fix required correlating error messages across several services, and the conversation history grew too long. Starting a fresh context and summarising the problem solved it. + +Amp automatically selects the best model for the task at hand. Based on the response speed and Sourcegraph's recent announcements, I believe it was using Claude Opus 4.5 for most of my coding and infrastructure work. The quality was excellent—it understood Python, Kubernetes, OpenTelemetry, and Grafana tooling without much hand-holding. + +Let me be clear: without the LLM, I'd never have managed to write all these configuration files by hand in two days. The Alloy config alone is 1400+ lines. But I also reviewed and verified every change manually, verified it made sense, and understood what was being deployed. This wasn't vibe-coding—the whole point of the hackathon was to learn. I already knew Grafana and Prometheus from previous work, but OpenTelemetry, Alloy, Tempo, Loki and the X-RAG system overall were all pretty new to me. By reviewing each generated config and understanding why it was structured that way, I actually learned the tools rather than just deploying magic incantations. + +Cost-wise, I spent around 20 USD on Amp credits over the two-day hackathon. For the amount of code generated, configs reviewed, and debugging assistance—that's remarkably affordable. + +## Other changes along the way + +Looking at the git history, I made 25 commits during the hackathon. Beyond the main observability features, there were several smaller but useful additions: + +`OBSERVABILITY_ENABLED flag`: Added an environment variable to completely disable the monitoring stack. Set `OBSERVABILITY_ENABLED=false` in `.env` and the cluster starts without Prometheus, Grafana, Tempo, Loki, or Alloy. Useful when you just want to work on application code without the overhead. + +`Load generator`: Added a `make load-gen` target that fires concurrent requests at the search API. Useful for generating enough trace data to see patterns in Tempo, and for stress-testing the observability pipeline itself. + +`Verification scripts`: Created scripts to test that OTLP is actually reaching Alloy and that traces appear in Tempo. Debugging "why aren't my traces showing up?" is frustrating without a systematic way to verify each hop in the pipeline. + +`Moving monitoring to dedicated namespace`: Refactored from having observability components scattered across namespaces to a clean `monitoring` namespace. Makes `kubectl get pods -n monitoring` show exactly what's running for observability. + +## Lessons learned + +* Start with metrics, but don't stop there—they tell you *what*, not *why* +* Trace context propagation is the key to distributed debugging +* Grafana Alloy as a unified collector simplifies the pipeline +* Infrastructure metrics matter—your app is only as fast as your data layer +* The three pillars work together; none is sufficient alone + +All manifests and observability code live in Florian's repository: + +=> https://github.com/florianbuetow/x-rag X-RAG on GitHub (source code, K8s manifests, observability configs) + +The best part? Everything I learned during this hackathon—OpenTelemetry instrumentation, Grafana Alloy configuration, trace context propagation, PromQL queries—I can immediately apply at work as we are shifting to that new observability stack and I am going to have a few meetings talking with developers how and what they need to implement for application instrumentalization. Observability patterns are universal, and hands-on experience with a real distributed system beats reading documentation any day. + +E-Mail your comments to paul@nospam.buetow.org + +=> ../ Back to the main site diff --git a/gemfeed/DRAFT-x-rag-observability.gmi.tpl b/gemfeed/DRAFT-x-rag-observability-hackathon.gmi.tpl index 7c8b3516..f1ff517e 100644 --- a/gemfeed/DRAFT-x-rag-observability.gmi.tpl +++ b/gemfeed/DRAFT-x-rag-observability-hackathon.gmi.tpl @@ -1,8 +1,6 @@ -# X-RAG: A Journey from Blind to Enlightened +# Adding Observability to X-RAG -> Published at DRAFT - -This blog post describes my journey adding observability to X-RAG, a distributed Retrieval-Augmented Generation (RAG) platform built by my brother Florian. I joined a 3-day hackathon (attending 2 days) with the goal of instrumenting his existing distributed system with proper observability. What started as "let's add some metrics" turned into a comprehensive implementation of the three pillars of observability: tracing, metrics, and logs. +This blog post describes my hackathon efforts adding observability to X-RAG, a distributed Retrieval-Augmented Generation (RAG) platform built by my brother Florian. I especially made time available over the weekend to join his 3-day hackathon (attending 2 days) with the goal of instrumenting his existing distributed system with observability. What started as "let's add some metrics" turned into a comprehensive implementation of the three pillars of observability: tracing, metrics, and logs. => https://github.com/florianbuetow/x-rag X-RAG source code on GitHub @@ -10,9 +8,11 @@ This blog post describes my journey adding observability to X-RAG, a distributed ## What is X-RAG? -X-RAG is a production-grade distributed RAG (Retrieval-Augmented Generation) platform running on Kubernetes. The idea behind RAG is simple: instead of asking an LLM to answer questions from its training data alone, you first retrieve relevant documents from your own knowledge base, then feed those documents to the LLM as context. The LLM synthesises an answer grounded in your actual content—reducing hallucinations and enabling answers about private or recent information the model was never trained on. +X-RAG is a distributed RAG (Retrieval-Augmented Generation) platform running on Kubernetes. The idea behind RAG is simple: instead of asking an LLM to answer questions from its training data alone, you first retrieve relevant documents from your own knowledge base, then feed those documents to the LLM as context. The LLM synthesises an answer grounded in your actual content—reducing hallucinations and enabling answers about private or recent information the model was never trained on. + +X-RAG handles the full pipeline: ingest documents, chunk them into searchable pieces, generate vector embeddings, store them in a vector database, and at query time, retrieve relevant chunks and pass them to an LLM for answer generation. The system supports both local LLMs (Florian runs his on a beefy desktop) and cloud APIs like OpenAI. I configured an OpenAI API key since my laptop's CPU and GPU aren't fast enough for decent local inference. -X-RAG handles the full pipeline: ingest documents, chunk them into searchable pieces, generate vector embeddings, store them in a vector database, and at query time, retrieve relevant chunks and pass them to an LLM for answer generation. The system supports both local LLMs (Florian runs his on a beefy desktop) and cloud APIs like OpenAI. I configured an OpenAI API key since my laptop's CPU isn't fast enough for decent local inference. +All services are implemented in Python. I'm more used to Ruby, Go, and Bash these days, but for this project it didn't matter—Python's OpenTelemetry integration is straightforward, I wasn't planning to write or rewrite tons of application code, and with GenAI assistance the language barrier was a non-issue. The OpenTelemetry concepts and patterns should translate to other languages too—the SDK APIs are intentionally similar across Python, Go, Java, and others. X-RAG consists of several independently scalable microservices: @@ -22,6 +22,10 @@ X-RAG consists of several independently scalable microservices: * Indexer: Kafka consumer that processes documents * Search Service: gRPC service orchestrating the RAG pipeline +The Embedding Service deserves extra explanation because in the beginning I didn't really knew what it was. Text isn't directly searchable in a vector database—you need to convert it to numerical vectors (embeddings) that capture semantic meaning. The Embedding Service takes text chunks and calls an embedding model (OpenAI's `text-embedding-3-small` in my case, or a local model on Florian's setup) to produce these vectors. For the LLM search completion answer, I used `gpt-4o-mini`. + +Similar concepts end up with similar vectors, so "What is machine learning?" and "Explain ML" produce vectors close together in the embedding space. At query time, your question gets embedded too, and the vector database finds chunks with nearby vectors—that's semantic search. + The data layer includes Weaviate (vector database with hybrid search), Kafka (message queue), MinIO (object storage), and Redis (cache). All of this runs in a Kind Kubernetes cluster for local development, with the same manifests deployable to production. ``` @@ -49,6 +53,18 @@ X-RAG runs on Kubernetes, but you don't need a cloud account to develop it. The Kind spins up a full Kubernetes cluster using Docker containers as nodes. The control plane (API server, etcd, scheduler, controller-manager) runs in one container, and worker nodes run in separate containers. Inside these "node containers," pods run just like they would on real servers—using containerd as the container runtime. It's containers all the way down. +Technically, each Kind node is a Docker container running a minimal Linux image with kubelet and containerd installed. When you deploy a pod, kubelet inside the node container instructs containerd to pull and run the container image. So you have Docker running node containers, and inside those, containerd running application containers. Network-wise, Kind sets up a Docker bridge network and uses CNI plugins (kindnet by default) for pod networking within the cluster. + +``` +$ docker ps --format "table {{.Names}}\t{{.Image}}" +NAMES IMAGE +xrag-k8-control-plane kindest/node:v1.32.0 +xrag-k8-worker kindest/node:v1.32.0 +xrag-k8-worker2 kindest/node:v1.32.0 +``` + +The `kindest/node` image contains everything needed: kubelet, containerd, CNI plugins, and pre-pulled pause containers. Port mappings in the Kind config expose services to the host—that's how http://localhost:8080 reaches the search-ui running inside a pod, inside a worker container, inside Docker. + ``` ┌─────────────────────────────────────────────────────────────────────────┐ │ Docker Host │ @@ -66,17 +82,17 @@ Kind spins up a full Kubernetes cluster using Docker containers as nodes. The co └─────────────────────────────────────────────────────────────────────────┘ ``` -Why Kind? It gives you a real Kubernetes environment—the same manifests deploy to production clouds unchanged. No minikube quirks, no Docker Compose translation layer. Just Kubernetes. +Why Kind? It gives you a real Kubernetes environment—the same manifests deploy to production clouds unchanged. No minikube quirks, no Docker Compose translation layer. Just Kubernetes. I already have a k3s cluster running at home, but Kind made collaboration easier—everyone working on X-RAG gets the exact same setup by cloning the repo and running `make cluster-start`. -Florian developed X-RAG on macOS, but it worked seamlessly on my Linux laptop. The only difference was Docker's resource allocation: on macOS you configure limits in Docker Desktop, on Linux it uses host resources directly. +Florian developed X-RAG on macOS, but it worked seamlessly on my Linux laptop. The only difference was Docker's resource allocation: on macOS you configure limits in Docker Desktop, on Linux it uses host resources directly. That's because under macOS the Linux Docker containers run on an emulation layer as macOS is not Linux. -My hardware: a ThinkPad X1 Carbon Gen 9 with an 11th Gen Intel Core i7-1185G7 (4 cores, 8 threads at 3.00GHz) and 32GB RAM. During the hackathon, memory usage peaked around 15GB—comfortable headroom. CPU was the bottleneck; with ~38 pods running across all namespaces (rag-system, monitoring, kube-system, etc.), plus Discord for the remote video call and Tidal streaming hi-res music, things got tight. When rebuilding Docker images or restarting the cluster, Discord video and audio would stutter—my fellow hackers probably wondered why I kept freezing mid-sentence. A beefier CPU would have meant less waiting and smoother calls, but it was manageable. +My hardware: a ThinkPad X1 Carbon Gen 9 with an 11th Gen Intel Core i7-1185G7 (4 cores, 8 threads at 3.00GHz) and 32GB RAM (running Fedora Linux). During the hackathon, memory usage peaked around 15GB—comfortable headroom. CPU was the bottleneck; with ~38 pods running across all namespaces (rag-system, monitoring, kube-system, etc.), plus Discord for the remote video call and Tidal streaming hi-res music, things got tight. When rebuilding Docker images or restarting the cluster, Discord video and audio would stutter—my fellow hackers probably wondered why I kept freezing mid-sentence. A beefier CPU would have meant less waiting and smoother calls, but it was manageable. -## The problem: flying blind +## Motivation -When I joined the hackathon, Florian's X-RAG was functional but opaque. With five services communicating via gRPC, Kafka, and HTTP, debugging was cumbersome. When a search request took 5 seconds instead of the expected 500 milliseconds, there was no visibility into where the time was being spent. Was it the embedding generation? The vector search? The LLM synthesis? Nobody knew. +When I joined the hackathon, Florian's X-RAG was functional but opaque. With five services communicating via gRPC, Kafka, and HTTP, debugging was cumbersome. When a search request take 5 seconds, there was no visibility into where the time was being spent. Was it the embedding generation? The vector search? The LLM synthesis? Nobody would be able to figure it out quickly. -Distributed systems are inherently opaque. Each service logs its own view of the world, but correlating events across service boundaries is archaeology. Grepping through logs on five different pods, trying to mentally reconstruct what happened—not fun. This was the perfect hackathon project: a real problem with tangible results. +Distributed systems are inherently opaque. Each service logs its own view of the world, but correlating events across service boundaries is archaeology. Grepping through logs on many pods, trying to mentally reconstruct what happened—not fun. This was the perfect hackathon project: Explore this Observability Stack in greater depth. ## The observability stack @@ -115,7 +131,7 @@ Everything is accessible via port-forwards: ## Grafana Alloy: the unified collector -Before diving into individual signals, I want to highlight Grafana Alloy—the component that ties everything together. Alloy is Grafana's vendor-neutral OpenTelemetry Collector distribution, and it became the backbone of the observability stack. +Before diving into the individual signals, I want to highlight Grafana Alloy—the component that ties everything together. Alloy is Grafana's vendor-neutral OpenTelemetry Collector distribution, and it became the backbone of the observability stack. => https://grafana.com/docs/alloy/latest/ Grafana Alloy documentation @@ -126,100 +142,13 @@ Why use a centralised collector instead of having each service push directly to * `Processing pipeline`: Batch data before sending, filter noisy metrics, enrich with labels—all in one place. * `Backend flexibility`: Switch from Tempo to Jaeger without changing application code. -Alloy uses a configuration language called River, which feels similar to Terraform's HCL—declarative blocks with attributes. If you've written Terraform, River will look familiar. Here's what we configured for X-RAG: - -`Receiving telemetry (OTLP)`: -``` -otelcol.receiver.otlp "default" { - grpc { endpoint = "0.0.0.0:4317" } - http { endpoint = "0.0.0.0:4318" } - output { - metrics = [otelcol.processor.batch.metrics.input] - traces = [otelcol.processor.batch.traces.input] - } -} -``` - -Applications push metrics and traces to Alloy on ports 4317 (gRPC) or 4318 (HTTP). Alloy routes them to batch processors. - -`Batching for efficiency`: -``` -otelcol.processor.batch "metrics" { - timeout = "5s" - send_batch_size = 1000 - output { metrics = [otelcol.exporter.prometheus.default.input] } -} - -otelcol.processor.batch "traces" { - timeout = "5s" - send_batch_size = 500 - output { traces = [otelcol.exporter.otlp.tempo.input] } -} -``` - -Instead of sending each metric individually, Alloy accumulates up to 1000 metrics (or waits 5 seconds) before flushing. This reduces network overhead and protects backends from being overwhelmed. - -`Exporting to storage backends`: -``` -otelcol.exporter.prometheus "default" { - forward_to = [prometheus.remote_write.prom.receiver] -} - -otelcol.exporter.otlp "tempo" { - client { - endpoint = "tempo.monitoring.svc.cluster.local:4317" - tls { insecure = true } - } -} -``` - -Metrics get converted to Prometheus format and pushed via remote_write. Traces go to Tempo via OTLP. - -`Scraping Kubernetes metrics`: - -Alloy also pulls metrics from Kubernetes itself—kubelet resource metrics, cAdvisor container metrics, and kube-state-metrics for cluster state: - -``` -prometheus.scrape "kubelet_resource" { - targets = discovery.relabel.kubelet.output - metrics_path = "/metrics/resource" - scrape_interval = "30s" - forward_to = [prometheus.relabel.kubelet_resource_filter.receiver] -} -``` - -`Collecting logs`: - -For logs, Alloy discovers pods via the Kubernetes API, tails their log files from /var/log/pods/, and ships to Loki: - -``` -loki.source.kubernetes "pod_logs" { - targets = discovery.relabel.pod_logs.output - forward_to = [loki.process.pod_logs.receiver] -} - -loki.write "default" { - endpoint { - url = "http://loki.monitoring.svc.cluster.local:3100/loki/api/v1/push" - } -} -``` - -The full Alloy configuration runs to over 1400 lines with comments explaining each section. It handles: - -* OTLP receiver for application metrics and traces -* Batch processors for efficiency -* Prometheus exporter with remote_write -* Tempo exporter for traces -* Kubelet, cAdvisor, and kube-state-metrics scraping -* Infrastructure metrics (Redis, Kafka, MinIO exporters) -* Pod log collection and shipping to Loki +Alloy uses a configuration language called River, which feels similar to Terraform's HCL—declarative blocks with attributes. If you've written Terraform, River will look familiar. The full Alloy configuration runs to over 1400 lines with comments explaining each section. It handles OTLP receiving, batch processing, Prometheus export, Tempo export, Kubernetes metrics scraping, infrastructure metrics, and pod log collection. All three signals—metrics, traces, logs—flow through this single component, making Alloy the central nervous system of the observability stack. -All three signals—metrics, traces, logs—flow through this single component, making Alloy the central nervous system of the observability stack. +In the following sections, I'll cover each observability pillar and show the relevant Alloy configuration for each. -## Step 1: centralised logging with Loki +## Centralised logging with Loki -The first step was getting all logs in one place. I deployed Grafana Loki in the monitoring namespace, with Grafana Alloy running as a DaemonSet on each node to collect logs. +Getting all logs in one place was the foundation. I deployed Grafana Loki in the monitoring namespace, with Grafana Alloy running as a DaemonSet on each node to collect logs. ``` ┌──────────────────────────────────────────────────────────────────────┐ @@ -239,12 +168,14 @@ The first step was getting all logs in one place. I deployed Grafana Loki in the └──────────────────────────────────────────────────────────────────────┘ ``` -Alloy's configuration uses River language to discover Kubernetes pods and add labels: +### Alloy configuration for logs + +Alloy discovers pods via the Kubernetes API, tails their log files from /var/log/pods/, and ships to Loki. Importantly, Alloy runs as a DaemonSet on each worker node—it doesn't run inside the application pods. Since containerd writes all container stdout/stderr to /var/log/pods/ on the node's filesystem, Alloy can tail logs for every pod on that node from a single location without any sidecar injection: ``` -loki.source.kubernetes "pods" { - targets = discovery.relabel.pods.output - forward_to = [loki.write.default.receiver] +loki.source.kubernetes "pod_logs" { + targets = discovery.relabel.pod_logs.output + forward_to = [loki.process.pod_logs.receiver] } loki.write "default" { @@ -254,17 +185,17 @@ loki.write "default" { } ``` -Now I could query logs with LogQL: +### Querying logs with LogQL + +Now I could query logs in Loki (e.g. via Grafana UI) with LogQL: ``` {namespace="rag-system", container="search-ui"} |= "ERROR" ``` -But there was a problem: logs lacked correlation. I could see that an error occurred in the indexer, but I couldn't trace it back to the specific ingestion request that triggered it. - -## Step 2: metrics with Prometheus +## Metrics with Prometheus -Next, I added Prometheus metrics to every service. Following the Four Golden Signals (latency, traffic, errors, saturation), I instrumented the codebase with histograms, counters, and gauges: +I added Prometheus metrics to every service. Following the Four Golden Signals (latency, traffic, errors, saturation), I instrumented the codebase with histograms, counters, and gauges: ```python from prometheus_client import Histogram, Counter, Gauge @@ -285,6 +216,8 @@ errors_total = Counter( Initially, I used Prometheus scraping—each service exposed a /metrics endpoint, and Prometheus pulled metrics every 15 seconds. This worked, but I wanted a unified pipeline. +### Alloy configuration for application metrics + The breakthrough came with Grafana Alloy as an OpenTelemetry collector. Services now push metrics via OTLP (OpenTelemetry Protocol), and Alloy converts them to Prometheus format: ``` @@ -308,9 +241,145 @@ The breakthrough came with Grafana Alloy as an OpenTelemetry collector. Services └─────────────────────┘ ``` -With Grafana dashboards, I could now see latency percentiles, throughput, and error rates. But metrics told me *that* something was wrong—they didn't tell me *where* in the request path the problem occurred. +Alloy receives OTLP on ports 4317 (gRPC) or 4318 (HTTP), batches the data for efficiency, and exports to Prometheus: + +``` +otelcol.receiver.otlp "default" { + grpc { endpoint = "0.0.0.0:4317" } + http { endpoint = "0.0.0.0:4318" } + output { + metrics = [otelcol.processor.batch.metrics.input] + traces = [otelcol.processor.batch.traces.input] + } +} + +otelcol.processor.batch "metrics" { + timeout = "5s" + send_batch_size = 1000 + output { metrics = [otelcol.exporter.prometheus.default.input] } +} + +otelcol.exporter.prometheus "default" { + forward_to = [prometheus.remote_write.prom.receiver] +} +``` + +Instead of sending each metric individually, Alloy accumulates up to 1000 metrics (or waits 5 seconds) before flushing. This reduces network overhead and protects backends from being overwhelmed. + +### Kubernetes metrics: kubelet, cAdvisor, and kube-state-metrics + +Alloy also pulls metrics from Kubernetes itself—kubelet resource metrics, cAdvisor container metrics, and kube-state-metrics for cluster state. + +Why three separate sources? It does feel fragmented, but each serves a distinct purpose. `kubelet` exposes resource metrics about pod CPU and memory usage from its own bookkeeping—lightweight summaries of what's running on each node. `cAdvisor` (Container Advisor) runs inside kubelet and provides detailed container-level metrics: CPU throttling, memory working sets, filesystem I/O, network bytes. These are the raw runtime stats from containerd. `kube-state-metrics` is different—it doesn't measure resource usage at all. Instead, it queries the Kubernetes API and exposes the *desired state*: how many replicas a Deployment wants, whether a Pod is pending or running, what resource requests and limits are configured. You need all three because "container used 500MB" (cAdvisor), "pod requested 1GB" (kube-state-metrics), and "node has 4GB available" (kubelet) are complementary views. The fragmentation is a consequence of Kubernetes' architecture—no single component has the complete picture. + +None of these components speak OpenTelemetry—they all expose Prometheus-format metrics via HTTP endpoints. That's why Alloy uses `prometheus.scrape` instead of receiving OTLP pushes. Alloy handles both worlds: OTLP from our applications, Prometheus scraping for infrastructure. + +``` +prometheus.scrape "kubelet_resource" { + targets = discovery.relabel.kubelet.output + job_name = "kubelet-resource" + scheme = "https" + scrape_interval = "30s" + bearer_token_file = "/var/run/secrets/kubernetes.io/serviceaccount/token" + tls_config { insecure_skip_verify = true } + forward_to = [prometheus.remote_write.prom.receiver] +} + +prometheus.scrape "cadvisor" { + targets = discovery.relabel.cadvisor.output + job_name = "cadvisor" + scheme = "https" + scrape_interval = "60s" + bearer_token_file = "/var/run/secrets/kubernetes.io/serviceaccount/token" + tls_config { insecure_skip_verify = true } + forward_to = [prometheus.relabel.cadvisor_filter.receiver] +} + +prometheus.scrape "kube_state_metrics" { + targets = [ + {"__address__" = "kube-state-metrics.monitoring.svc.cluster.local:8080"}, + ] + job_name = "kube-state-metrics" + scrape_interval = "30s" + forward_to = [prometheus.relabel.kube_state_filter.receiver] +} +``` + +Note that `kubelet` and `cAdvisor` require HTTPS with bearer token authentication (using the service account token mounted by Kubernetes), while `kube-state-metrics` is a simple HTTP target. `cAdvisor` is scraped less frequently (60s) because it returns many more metrics with higher cardinality. + +### Infrastructure metrics: Kafka, Redis, MinIO + +Application metrics weren't enough. I also needed visibility into the data layer. Each infrastructure component has a specific role in X-RAG and got its own exporter: + +`Redis` is the caching layer. It stores search results and embeddings to avoid redundant API calls to OpenAI. We collect 25 metrics via oliver006/redis_exporter running as a sidecar, including cache hit/miss rates, memory usage, connected clients, and command latencies. The key metric? `redis_keyspace_hits_total / (redis_keyspace_hits_total + redis_keyspace_misses_total)` tells you if caching is actually helping. + +`Kafka` is the message queue connecting the ingestion API to the indexer. Documents are published to a topic, and the indexer consumes them asynchronously. We collect 12 metrics via danielqsj/kafka-exporter, with consumer lag being the most critical—it shows how far behind the indexer is. High lag means documents aren't being indexed fast enough. + +`MinIO` is the S3-compatible object storage where raw documents are stored before processing. We collect 16 metrics from its native /minio/v2/metrics/cluster endpoint, covering request rates, error counts, storage usage, and cluster health. + +You can verify these counts by querying Prometheus directly: + +``` +$ curl -s 'http://localhost:9090/api/v1/label/__name__/values' \ + | jq -r '.data[]' | grep -c '^redis_' +25 +$ curl -s 'http://localhost:9090/api/v1/label/__name__/values' \ + | jq -r '.data[]' | grep -c '^kafka_' +12 +$ curl -s 'http://localhost:9090/api/v1/label/__name__/values' \ + | jq -r '.data[]' | grep -c '^minio_' +16 +``` + +=> https://github.com/florianbuetow/x-rag/blob/main/infra/k8s/monitoring/alloy-config.yaml Full Alloy configuration with detailed metric filtering + +Alloy scrapes all of these and remote-writes to Prometheus: + +``` +prometheus.scrape "redis_exporter" { + targets = [ + {"__address__" = "xrag-redis.rag-system.svc.cluster.local:9121"}, + ] + job_name = "redis" + scrape_interval = "30s" + forward_to = [prometheus.relabel.redis_filter.receiver] +} + +prometheus.scrape "kafka_exporter" { + targets = [ + {"__address__" = "kafka-exporter.rag-system.svc.cluster.local:9308"}, + ] + job_name = "kafka" + scrape_interval = "30s" + forward_to = [prometheus.relabel.kafka_filter.receiver] +} + +prometheus.scrape "minio" { + targets = [ + {"__address__" = "xrag-minio.rag-system.svc.cluster.local:9000"}, + ] + job_name = "minio" + metrics_path = "/minio/v2/metrics/cluster" + scrape_interval = "30s" + forward_to = [prometheus.relabel.minio_filter.receiver] +} +``` + +Note that MinIO exposes metrics at a custom path (`/minio/v2/metrics/cluster`) rather than the default `/metrics`. Each exporter forwards to a relabel component that filters down to essential metrics before sending to Prometheus. + +With all metrics in Prometheus, I can use PromQL queries in Grafana dashboards. For example, to check Kafka consumer lag and see if the indexer is falling behind: + +```promql +sum by (consumergroup, topic) (kafka_consumergroup_lag) +``` + +Or check Redis cache effectiveness: -## Step 3: the breakthrough—distributed tracing +```promql +redis_keyspace_hits_total / (redis_keyspace_hits_total + redis_keyspace_misses_total) +``` + +## Distributed tracing with Tempo ### Understanding traces, spans, and the trace tree @@ -378,7 +447,7 @@ This is why all my spans link together—OpenTelemetry's auto-instrumentation ha ### Implementation -The real enlightenment came with OpenTelemetry tracing. I integrated auto-instrumentation for FastAPI, gRPC, and HTTP clients, plus manual spans for RAG-specific operations: +This is where distributed tracing made the difference. I integrated OpenTelemetry auto-instrumentation for FastAPI, gRPC, and HTTP clients, plus manual spans for RAG-specific operations: ```python from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor @@ -394,6 +463,10 @@ with tracer.start_as_current_span("llm.rag_completion") as span: result = await generate_answer(query, context) ``` +`Auto-instrumentation` is the quick win: one line of code and you get spans for every HTTP request, gRPC call, or database query. The instrumentor patches the framework at runtime, so existing code works without modification. The downside? You only get what the library authors decided to capture—generic HTTP attributes like `http.method` and `http.status_code`, but nothing domain-specific. Auto-instrumented spans also can't know your business logic, so a slow request shows up as "POST /api/search took 5 seconds" without revealing which internal operation caused the delay. + +`Manual spans` fill that gap. By wrapping specific operations (like `llm.rag_completion` or `vector_search.query`), you get visibility into your application's unique behaviour. You can add custom attributes (`llm.model`, `query.top_k`, `cache.hit`) that make traces actually useful for debugging. The downside is maintenance: manual spans are code you write and maintain, and you need to decide where instrumentation adds value versus where it just adds noise. In practice, I found the right balance was auto-instrumentation for framework boundaries (HTTP, gRPC) plus manual spans for the 5-10 operations that actually matter for understanding performance. + The magic is trace context propagation. When the Search UI calls the Search Service via gRPC, the trace ID travels in metadata headers: ``` @@ -422,54 +495,26 @@ Trace ID: 0af7651916cd43dd8448eb211c80319c │ └─ openai.chat (115ms) ``` -Traces are collected by Alloy and stored in Grafana Tempo. In Tempo's UI, I can finally see exactly where time is spent. That 5-second query? Turns out the vector search was waiting on a cold Weaviate connection. Now I knew what to fix. - -## Infrastructure metrics: Kafka, Redis, MinIO - -Application metrics weren't enough. I also needed visibility into the data layer. Each infrastructure component has a specific role in X-RAG and got its own exporter: - -`Redis` is the caching layer. It stores search results and embeddings to avoid redundant API calls to OpenAI. We collect 25 metrics via oliver006/redis_exporter running as a sidecar, including cache hit/miss rates, memory usage, connected clients, and command latencies. The key metric? `redis_keyspace_hits_total / (redis_keyspace_hits_total + redis_keyspace_misses_total)` tells you if caching is actually helping. - -`Kafka` is the message queue connecting the ingestion API to the indexer. Documents are published to a topic, and the indexer consumes them asynchronously. We collect 12 metrics via danielqsj/kafka-exporter, with consumer lag being the most critical—it shows how far behind the indexer is. High lag means documents aren't being indexed fast enough. - -`MinIO` is the S3-compatible object storage where raw documents are stored before processing. We collect 16 metrics from its native /minio/v2/metrics/cluster endpoint, covering request rates, error counts, storage usage, and cluster health. - -You can verify these counts by querying Prometheus directly: - -``` -$ curl -s 'http://localhost:9090/api/v1/label/__name__/values' \ - | jq -r '.data[]' | grep -c '^redis_' -25 -$ curl -s 'http://localhost:9090/api/v1/label/__name__/values' \ - | jq -r '.data[]' | grep -c '^kafka_' -12 -$ curl -s 'http://localhost:9090/api/v1/label/__name__/values' \ - | jq -r '.data[]' | grep -c '^minio_' -16 -``` - -=> https://github.com/florianbuetow/x-rag/blob/main/infra/k8s/monitoring/alloy-config.yaml Full Alloy configuration with detailed metric filtering +### Alloy configuration for traces -Alloy scrapes all of these and remote-writes to Prometheus: +Traces are collected by Alloy and stored in Grafana Tempo. Alloy batches traces for efficiency before exporting via OTLP: ``` -prometheus.scrape "redis_exporter" { - targets = [{"__address__" = "xrag-redis:9121"}] - forward_to = [prometheus.remote_write.prom.receiver] +otelcol.processor.batch "traces" { + timeout = "5s" + send_batch_size = 500 + output { traces = [otelcol.exporter.otlp.tempo.input] } } -``` -Now I can query Kafka consumer lag to see if the indexer is falling behind: - -```promql -sum by (consumergroup, topic) (kafka_consumergroup_lag) +otelcol.exporter.otlp "tempo" { + client { + endpoint = "tempo.monitoring.svc.cluster.local:4317" + tls { insecure = true } + } +} ``` -Or check Redis cache effectiveness: - -```promql -redis_keyspace_hits_total / (redis_keyspace_hits_total + redis_keyspace_misses_total) -``` +In Tempo's UI, I can finally see exactly where time is spent. That 5-second query? Turns out the vector search was waiting on a cold Weaviate connection. Now I knew what to fix. ## Async ingestion trace walkthrough @@ -502,6 +547,8 @@ The ingestion API immediately returns—it doesn't wait for indexing. The docume ### Step 2: Find the ingestion trace +Using Tempo's HTTP API (port 3200), we can search for traces by span name using TraceQL: + ``` $ curl -s -G "http://localhost:3200/api/search" \ --data-urlencode 'q={name="POST /ingest"}' \ @@ -564,9 +611,9 @@ To view a trace in Grafana's UI: The trace viewer shows a Gantt chart with all spans, their timing, and parent-child relationships. Click any span to see its attributes. -=> ./x-rag-observability/index-trace.png Async ingestion trace in Grafana Tempo +=> ./x-rag-observability-hackathon/index-trace.png Async ingestion trace in Grafana Tempo -=> ./x-rag-observability/index-node-graph.png Ingestion trace node graph showing service dependencies +=> ./x-rag-observability-hackathon/index-node-graph.png Ingestion trace node graph showing service dependencies ## End-to-end search trace walkthrough @@ -574,6 +621,8 @@ To demonstrate the observability stack in action, here's a complete trace from a ### Step 1: Make a search request +Normally you'd use the Search UI web interface at http://localhost:8080, but for demonstration purposes curl makes it easier to show the raw request and response: + ``` $ curl -s -X POST http://localhost:8080/api/search \ -H "Content-Type: application/json" \ @@ -675,9 +724,9 @@ Other useful TraceQL queries: Follow the same steps as above, but use the search trace ID: `9df981cac91857b228eca42b501c98c6` -=> ./x-rag-observability/search-trace.png Search trace in Grafana Tempo +=> ./x-rag-observability-hackathon/search-trace.png Search trace in Grafana Tempo -=> ./x-rag-observability/search-node-graph.png Search trace node graph showing service flow +=> ./x-rag-observability-hackathon/search-node-graph.png Search trace node graph showing service flow ## Correlating the three signals @@ -693,6 +742,24 @@ The real power comes from correlating traces, metrics, and logs. When an alert f Prometheus exemplars link specific metric samples to trace IDs, so I can click directly from a latency spike to the responsible trace. +## Grafana dashboards + +During the hackathon, I also created six pre-built Grafana dashboards that are automatically provisioned when the monitoring stack starts: + +| Dashboard | Description | +|-----------|-------------| +| **X-RAG Overview** | The main dashboard with 22 panels covering request rates, latencies, error rates, and service health across all X-RAG components | +| **OpenTelemetry HTTP Metrics** | HTTP request/response metrics from OpenTelemetry-instrumented services—request rates, latency percentiles, and status code breakdowns | +| **Pod System Metrics** | Kubernetes pod resource utilisation: CPU usage, memory consumption, network I/O, disk I/O, and pod state from kube-state-metrics | +| **Redis** | Cache performance: memory usage, hit/miss rates, commands per second, connected clients, and memory fragmentation | +| **Kafka** | Message queue health: consumer lag (critical for indexer monitoring), broker status, topic partitions, and throughput | +| **MinIO** | Object storage metrics: S3 request rates, error counts, traffic volume, bucket sizes, and disk usage | + +All dashboards are stored as JSON files in `infra/k8s/monitoring/grafana-dashboards/` and deployed via ConfigMaps, so they survive pod restarts and cluster recreations. + +=> ./x-rag-observability-hackathon/dashboard-xrag-overview.png X-RAG Overview dashboard +=> ./x-rag-observability-hackathon/dashboard-pod-system-metrics.png Pod System Metrics dashboard + ## Results: two days well spent What did two days of hackathon work achieve? The system went from flying blind to fully instrumented: @@ -700,20 +767,30 @@ What did two days of hackathon work achieve? The system went from flying blind t * All three pillars implemented: logs (Loki), metrics (Prometheus), traces (Tempo) * Unified collection via Grafana Alloy * Infrastructure metrics for Kafka, Redis, and MinIO -* Grafana dashboards with PromQL queries +* Six pre-built Grafana dashboards covering application metrics, pod resources, and infrastructure * Trace context propagation across all gRPC calls The biggest insight from testing? The embedding service wasn't the bottleneck I assumed. Traces revealed that LLM synthesis dominated latency, not embedding generation. Without tracing, optimisation efforts would have targeted the wrong component. -Beyond the technical wins, I had a lot of fun. The hackathon brought together people working on completely different projects, and I got to know some really nice folks during the breaks. There's something energising about being in a room full of people all heads-down on their own challenges—even if you're not collaborating directly, the shared focus is motivating. +Beyond the technical wins, I had a lot of fun. The hackathon brought together people working on different projects, and I got to know some really nice folks during the sessions themselves. There's something energising about being in a (virtual) room with other people all heads-down on their own challenges—even if you're not collaborating directly, the shared focus is motivating. + +## SLIs, SLOs and SLAs + +The system now has full observability, but there's always more. And to be clear: this is not production-grade yet. It works well for development and could scale to production, but that would need to be validated with proper load testing and chaos testing first. We haven't stress-tested the observability pipeline under heavy load, nor have we tested failure scenarios like Tempo going down or Alloy running out of memory. The Alloy config includes comments on sampling strategies and rate limiting that would be essential for high-traffic environments. -## What's next +One thing we didn't cover: monitoring and alerting. These are related but distinct from observability. Observability is about collecting and exploring data to understand system behaviour. Monitoring is about defining thresholds and alerting when they're breached. We have Prometheus with all the metrics, but no alerting rules yet—no PagerDuty integration, no Slack notifications when latency spikes or error rates climb. -The system is now "enlightened," but there's always more: +We also didn't define any SLIs (Service Level Indicators) or SLOs (Service Level Objectives). An SLI is a quantitative measure of service quality—for example, "99th percentile search latency" or "percentage of requests returning successfully." An SLO is a target for that indicator—"99th percentile latency should be under 2 seconds" or "99.9% of requests should succeed." Without SLOs, you don't know what "good" looks like, and alerting becomes arbitrary. -* Semantic monitoring: Using LLM evaluation as a metric (relevance scores, hallucination detection) -* Alerting rules: Prometheus alerts for SLO violations -* Sampling strategies: For high-traffic production, sample traces to reduce storage costs +For X-RAG specifically, potential SLOs might include: + +* `Search latency`: 99th percentile search response time under 3 seconds +* `Uptime`: 99.9% availability of the search API endpoint +* `Response quality`: Percentage of searches returning relevant results (though this is harder to measure automatically and might require user feedback or evaluation frameworks) + +SLAs (Service Level Agreements) are often confused with SLOs, but they're different. An SLA is a contractual commitment to customers—a legally binding promise with consequences (refunds, credits, penalties) if you fail to meet it. SLOs are internal engineering targets; SLAs are external business promises. Typically, SLAs are less strict than SLOs: if your internal target is 99.9% availability (SLO), your customer contract might promise 99.5% (SLA), giving you a buffer before you owe anyone money. + +But then again, X-RAG is a proof-of-concept, a prototype, a learning system—there are no real customers to disappoint. SLOs would become essential if this ever served actual users, and SLAs would follow once there's a business relationship to protect. ## Using Amp for AI-assisted development @@ -733,9 +810,9 @@ Chaining many small, focused tasks worked better than one massive plan. Each tas I only ran out of the 200k token context window once, during a debugging session that involved restarting the Kubernetes cluster multiple times. The fix required correlating error messages across several services, and the conversation history grew too long. Starting a fresh context and summarising the problem solved it. -Amp automatically selects the best model for the task at hand. Based on the response speed and Sourcegraph's recent announcements, I believe it was using Claude Opus 4.5 for most of my infrastructure work. The quality was excellent—it understood Kubernetes, OpenTelemetry, and Grafana tooling without much hand-holding. +Amp automatically selects the best model for the task at hand. Based on the response speed and Sourcegraph's recent announcements, I believe it was using Claude Opus 4.5 for most of my coding and infrastructure work. The quality was excellent—it understood Python, Kubernetes, OpenTelemetry, and Grafana tooling without much hand-holding. -Let me be clear: without the LLM, I'd never have managed to write all these configuration files by hand in two days. The Alloy config alone is 1400+ lines. But I also reviewed every change manually, verified it made sense, and understood what was being deployed. This wasn't vibe-coding—the whole point of the hackathon was to learn. I already knew Grafana and Prometheus from previous work, but OpenTelemetry, Alloy, Tempo, and Loki were all pretty new to me. By reviewing each generated config and understanding why it was structured that way, I actually learned the tools rather than just deploying magic incantations. +Let me be clear: without the LLM, I'd never have managed to write all these configuration files by hand in two days. The Alloy config alone is 1400+ lines. But I also reviewed and verified every change manually, verified it made sense, and understood what was being deployed. This wasn't vibe-coding—the whole point of the hackathon was to learn. I already knew Grafana and Prometheus from previous work, but OpenTelemetry, Alloy, Tempo, Loki and the X-RAG system overall were all pretty new to me. By reviewing each generated config and understanding why it was structured that way, I actually learned the tools rather than just deploying magic incantations. Cost-wise, I spent around 20 USD on Amp credits over the two-day hackathon. For the amount of code generated, configs reviewed, and debugging assistance—that's remarkably affordable. @@ -745,10 +822,6 @@ Looking at the git history, I made 25 commits during the hackathon. Beyond the m `OBSERVABILITY_ENABLED flag`: Added an environment variable to completely disable the monitoring stack. Set `OBSERVABILITY_ENABLED=false` in `.env` and the cluster starts without Prometheus, Grafana, Tempo, Loki, or Alloy. Useful when you just want to work on application code without the overhead. -`Metrics migration to OpenTelemetry SDK`: The original codebase used prometheus_client for metrics. I migrated everything to OpenTelemetry's metrics SDK so all telemetry (metrics, traces, logs) flows through the same OTLP pipeline to Alloy. One protocol, one collector. - -`Removing duplicate spans`: Auto-instrumentation is great until it creates spans that overlap with your manual instrumentation. I had to audit the traces and remove manual spans where FastAPI or gRPC instrumentors already covered the operation. - `Load generator`: Added a `make load-gen` target that fires concurrent requests at the search API. Useful for generating enough trace data to see patterns in Tempo, and for stress-testing the observability pipeline itself. `Verification scripts`: Created scripts to test that OTLP is actually reaching Alloy and that traces appear in Tempo. Debugging "why aren't my traces showing up?" is frustrating without a systematic way to verify each hop in the pipeline. @@ -767,16 +840,7 @@ All manifests and observability code live in Florian's repository: => https://github.com/florianbuetow/x-rag X-RAG on GitHub (source code, K8s manifests, observability configs) -The observability-specific files I added during the hackathon: - -* `infra/k8s/monitoring/` — Kubernetes manifests for Prometheus, Grafana, Tempo, Loki, Alloy -* `src/common/tracing.py` — OpenTelemetry TracerProvider initialisation -* `src/common/tracing_utils.py` — Manual span context managers -* `src/common/metrics.py` — Prometheus metric utilities -* `src/*/metrics.py` — Per-service metric definitions -* `docs/OBSERVABILITY.md` — Comprehensive observability guide - -The best part? Everything I learned during this hackathon—OpenTelemetry instrumentation, Grafana Alloy configuration, trace context propagation, PromQL queries—I can immediately apply at work. Observability patterns are universal, and hands-on experience with a real distributed system beats reading documentation any day. +The best part? Everything I learned during this hackathon—OpenTelemetry instrumentation, Grafana Alloy configuration, trace context propagation, PromQL queries—I can immediately apply at work as we are shifting to that new observability stack and I am going to have a few meetings talking with developers how and what they need to implement for application instrumentalization. Observability patterns are universal, and hands-on experience with a real distributed system beats reading documentation any day. E-Mail your comments to paul@nospam.buetow.org diff --git a/gemfeed/x-rag-observability-hackathon/dashboard-pod-system-metrics.png b/gemfeed/x-rag-observability-hackathon/dashboard-pod-system-metrics.png Binary files differnew file mode 100644 index 00000000..c633a7f3 --- /dev/null +++ b/gemfeed/x-rag-observability-hackathon/dashboard-pod-system-metrics.png diff --git a/gemfeed/x-rag-observability-hackathon/dashboard-xrag-overview.png b/gemfeed/x-rag-observability-hackathon/dashboard-xrag-overview.png Binary files differnew file mode 100644 index 00000000..8e898e0e --- /dev/null +++ b/gemfeed/x-rag-observability-hackathon/dashboard-xrag-overview.png diff --git a/gemfeed/x-rag-observability/index-node-graph.png b/gemfeed/x-rag-observability-hackathon/index-node-graph.png Binary files differindex 24cb4ba4..24cb4ba4 100644 --- a/gemfeed/x-rag-observability/index-node-graph.png +++ b/gemfeed/x-rag-observability-hackathon/index-node-graph.png diff --git a/gemfeed/x-rag-observability/index-trace.png b/gemfeed/x-rag-observability-hackathon/index-trace.png Binary files differindex 410492b6..410492b6 100644 --- a/gemfeed/x-rag-observability/index-trace.png +++ b/gemfeed/x-rag-observability-hackathon/index-trace.png diff --git a/gemfeed/x-rag-observability/search-node-graph.png b/gemfeed/x-rag-observability-hackathon/search-node-graph.png Binary files differindex 0a2eb2d3..0a2eb2d3 100644 --- a/gemfeed/x-rag-observability/search-node-graph.png +++ b/gemfeed/x-rag-observability-hackathon/search-node-graph.png diff --git a/gemfeed/x-rag-observability/search-trace.png b/gemfeed/x-rag-observability-hackathon/search-trace.png Binary files differindex d9cf7973..d9cf7973 100644 --- a/gemfeed/x-rag-observability/search-trace.png +++ b/gemfeed/x-rag-observability-hackathon/search-trace.png @@ -1,6 +1,6 @@ # Hello! -> This site was generated at 2025-12-07T10:16:25+02:00 by `Gemtexter` +> This site was generated at 2025-12-24T00:42:08+02:00 by `Gemtexter` Welcome to the foo.zone! diff --git a/notes/search-inside-yourself.gmi.tpl.lock b/notes/search-inside-yourself.gmi.tpl.lock new file mode 100644 index 00000000..e69de29b --- /dev/null +++ b/notes/search-inside-yourself.gmi.tpl.lock diff --git a/uptime-stats.gmi b/uptime-stats.gmi index 84b5d9b6..b6a6df35 100644 --- a/uptime-stats.gmi +++ b/uptime-stats.gmi @@ -1,6 +1,6 @@ # My machine uptime stats -> This site was last updated at 2025-12-07T10:16:25+02:00 +> This site was last updated at 2025-12-24T00:42:08+02:00 The following stats were collected via `uptimed` on all of my personal computers over many years and the output was generated by `guprecords`, the global uptime records stats analyser of mine. @@ -18,30 +18,30 @@ Also check out my blog post: Boots is the total number of host boots over the entire lifespan. ``` -+-----+----------------+-------+------------------------------+ -| Pos | Host | Boots | Last Kernel | -+-----+----------------+-------+------------------------------+ -| 1. | alphacentauri | 671 | FreeBSD 11.4-RELEASE-p7 | -| 2. | *earth | 213 | Linux 6.17.8-300.fc43.x86_64 | -| 3. | mars | 207 | Linux 3.2.0-4-amd64 | -| 4. | callisto | 153 | Linux 4.0.4-303.fc22.x86_64 | -| 5. | dionysus | 136 | FreeBSD 13.0-RELEASE-p11 | -| 6. | tauceti-e | 120 | Linux 3.2.0-4-amd64 | -| 7. | *makemake | 81 | Linux 6.9.9-200.fc40.x86_64 | -| 8. | *f2 | 70 | FreeBSD 14.3-RELEASE | -| 9. | *f1 | 65 | FreeBSD 14.3-RELEASE | -| 10. | *f0 | 62 | FreeBSD 14.3-RELEASE | -| 11. | uranus | 59 | NetBSD 10.1 | -| 12. | pluto | 51 | Linux 3.2.0-4-amd64 | -| 13. | mega15289 | 50 | Darwin 23.4.0 | -| 14. | *mega-m3-pro | 50 | Darwin 24.6.0 | -| 15. | *fishfinger | 46 | OpenBSD 7.7 | -| 16. | t450 | 44 | FreeBSD 14.2-RELEASE | -| 17. | *blowfish | 43 | OpenBSD 7.7 | -| 18. | phobos | 40 | Linux 3.4.0-CM-g1dd7cdf | -| 19. | mega8477 | 40 | Darwin 13.4.0 | -| 20. | sun | 33 | FreeBSD 10.3-RELEASE-p24 | -+-----+----------------+-------+------------------------------+ ++-----+----------------+-------+-------------------------------+ +| Pos | Host | Boots | Last Kernel | ++-----+----------------+-------+-------------------------------+ +| 1. | alphacentauri | 671 | FreeBSD 11.4-RELEASE-p7 | +| 2. | *earth | 217 | Linux 6.17.12-300.fc43.x86_64 | +| 3. | mars | 207 | Linux 3.2.0-4-amd64 | +| 4. | callisto | 153 | Linux 4.0.4-303.fc22.x86_64 | +| 5. | dionysus | 136 | FreeBSD 13.0-RELEASE-p11 | +| 6. | tauceti-e | 120 | Linux 3.2.0-4-amd64 | +| 7. | *makemake | 81 | Linux 6.9.9-200.fc40.x86_64 | +| 8. | *f2 | 70 | FreeBSD 14.3-RELEASE | +| 9. | *f1 | 65 | FreeBSD 14.3-RELEASE | +| 10. | *f0 | 62 | FreeBSD 14.3-RELEASE | +| 11. | uranus | 59 | NetBSD 10.1 | +| 12. | pluto | 51 | Linux 3.2.0-4-amd64 | +| 13. | *mega-m3-pro | 50 | Darwin 24.6.0 | +| 14. | mega15289 | 50 | Darwin 23.4.0 | +| 15. | *fishfinger | 46 | OpenBSD 7.7 | +| 16. | t450 | 44 | FreeBSD 14.2-RELEASE | +| 17. | *blowfish | 43 | OpenBSD 7.7 | +| 18. | phobos | 40 | Linux 3.4.0-CM-g1dd7cdf | +| 19. | mega8477 | 40 | Darwin 13.4.0 | +| 20. | sun | 33 | FreeBSD 10.3-RELEASE-p24 | ++-----+----------------+-------+-------------------------------+ ``` ## Top 20 Uptime's by Host @@ -53,7 +53,7 @@ Uptime is the total uptime of a host over the entire lifespan. | Pos | Host | Uptime | Last Kernel | +-----+----------------+-----------------------------+-----------------------------------+ | 1. | vulcan | 4 years, 5 months, 6 days | Linux 3.10.0-1160.81.1.el7.x86_64 | -| 2. | *earth | 3 years, 11 months, 15 days | Linux 6.17.8-300.fc43.x86_64 | +| 2. | *earth | 3 years, 11 months, 30 days | Linux 6.17.12-300.fc43.x86_64 | | 3. | *blowfish | 3 years, 10 months, 2 days | OpenBSD 7.7 | | 4. | sun | 3 years, 9 months, 26 days | FreeBSD 10.3-RELEASE-p24 | | 5. | uranus | 3 years, 9 months, 5 days | NetBSD 10.1 | @@ -65,7 +65,7 @@ Uptime is the total uptime of a host over the entire lifespan. | 11. | mega15289 | 1 years, 12 months, 17 days | Darwin 23.4.0 | | 12. | tauceti-f | 1 years, 9 months, 18 days | Linux 3.2.0-3-amd64 | | 13. | t450 | 1 years, 7 months, 26 days | FreeBSD 14.2-RELEASE | -| 14. | *mega-m3-pro | 1 years, 6 months, 20 days | Darwin 24.6.0 | +| 14. | *mega-m3-pro | 1 years, 7 months, 3 days | Darwin 24.6.0 | | 15. | mega8477 | 1 years, 3 months, 25 days | Darwin 13.4.0 | | 16. | host0 | 1 years, 3 months, 9 days | FreeBSD 6.2-RELEASE-p5 | | 17. | *makemake | 1 years, 3 months, 7 days | Linux 6.9.9-200.fc40.x86_64 | @@ -84,8 +84,8 @@ Score is calculated by combining all other metrics. | Pos | Host | Score | Last Kernel | +-----+----------------+-------+-----------------------------------+ | 1. | uranus | 340 | NetBSD 10.1 | -| 2. | vulcan | 275 | Linux 3.10.0-1160.81.1.el7.x86_64 | -| 3. | *earth | 272 | Linux 6.17.8-300.fc43.x86_64 | +| 2. | *earth | 275 | Linux 6.17.12-300.fc43.x86_64 | +| 3. | vulcan | 275 | Linux 3.10.0-1160.81.1.el7.x86_64 | | 4. | *blowfish | 243 | OpenBSD 7.7 | | 5. | sun | 238 | FreeBSD 10.3-RELEASE-p24 | | 6. | uugrn | 211 | FreeBSD 11.2-RELEASE-p4 | @@ -99,7 +99,7 @@ Score is calculated by combining all other metrics. | 14. | *makemake | 139 | Linux 6.9.9-200.fc40.x86_64 | | 15. | t450 | 119 | FreeBSD 14.2-RELEASE | | 16. | tauceti-f | 108 | Linux 3.2.0-3-amd64 | -| 17. | *mega-m3-pro | 99 | Darwin 24.6.0 | +| 17. | *mega-m3-pro | 101 | Darwin 24.6.0 | | 18. | tauceti-e | 96 | Linux 3.2.0-4-amd64 | | 19. | callisto | 86 | Linux 4.0.4-303.fc22.x86_64 | | 20. | mega8477 | 80 | Darwin 13.4.0 | @@ -111,30 +111,30 @@ Score is calculated by combining all other metrics. Downtime is the total downtime of a host over the entire lifespan. ``` -+-----+----------------+-----------------------------+------------------------------+ -| Pos | Host | Downtime | Last Kernel | -+-----+----------------+-----------------------------+------------------------------+ -| 1. | dionysus | 8 years, 3 months, 16 days | FreeBSD 13.0-RELEASE-p11 | -| 2. | uranus | 6 years, 7 months, 31 days | NetBSD 10.1 | -| 3. | alphacentauri | 5 years, 11 months, 18 days | FreeBSD 11.4-RELEASE-p7 | -| 4. | *makemake | 3 years, 8 months, 11 days | Linux 6.9.9-200.fc40.x86_64 | -| 5. | moon | 2 years, 1 months, 1 days | FreeBSD 14.0-RELEASE-p3 | -| 6. | callisto | 1 years, 5 months, 15 days | Linux 4.0.4-303.fc22.x86_64 | -| 7. | mega15289 | 1 years, 4 months, 24 days | Darwin 23.4.0 | -| 8. | t450 | 1 years, 2 months, 13 days | FreeBSD 14.2-RELEASE | -| 9. | mars | 1 years, 2 months, 10 days | Linux 3.2.0-4-amd64 | -| 10. | tauceti-e | 0 years, 12 months, 9 days | Linux 3.2.0-4-amd64 | -| 11. | sirius | 0 years, 8 months, 20 days | Linux 2.6.32-042stab111.12 | -| 12. | *f0 | 0 years, 8 months, 3 days | FreeBSD 14.3-RELEASE | -| 13. | *f2 | 0 years, 8 months, 2 days | FreeBSD 14.3-RELEASE | -| 14. | *f1 | 0 years, 8 months, 1 days | FreeBSD 14.3-RELEASE | -| 15. | *earth | 0 years, 6 months, 29 days | Linux 6.17.8-300.fc43.x86_64 | -| 16. | deimos | 0 years, 5 months, 15 days | Linux 4.4.5-300.fc23.x86_64 | -| 17. | joghurt | 0 years, 2 months, 9 days | FreeBSD 7.0-PRERELEASE | -| 18. | host0 | 0 years, 2 months, 1 days | FreeBSD 6.2-RELEASE-p5 | -| 19. | *mega-m3-pro | 0 years, 1 months, 20 days | Darwin 24.6.0 | -| 20. | fibonacci | 0 years, 1 months, 11 days | FreeBSD 5.3-RELEASE-p15 | -+-----+----------------+-----------------------------+------------------------------+ ++-----+----------------+-----------------------------+-------------------------------+ +| Pos | Host | Downtime | Last Kernel | ++-----+----------------+-----------------------------+-------------------------------+ +| 1. | dionysus | 8 years, 3 months, 16 days | FreeBSD 13.0-RELEASE-p11 | +| 2. | uranus | 6 years, 7 months, 31 days | NetBSD 10.1 | +| 3. | alphacentauri | 5 years, 11 months, 18 days | FreeBSD 11.4-RELEASE-p7 | +| 4. | *makemake | 3 years, 8 months, 11 days | Linux 6.9.9-200.fc40.x86_64 | +| 5. | moon | 2 years, 1 months, 1 days | FreeBSD 14.0-RELEASE-p3 | +| 6. | callisto | 1 years, 5 months, 15 days | Linux 4.0.4-303.fc22.x86_64 | +| 7. | mega15289 | 1 years, 4 months, 24 days | Darwin 23.4.0 | +| 8. | t450 | 1 years, 2 months, 13 days | FreeBSD 14.2-RELEASE | +| 9. | mars | 1 years, 2 months, 10 days | Linux 3.2.0-4-amd64 | +| 10. | tauceti-e | 0 years, 12 months, 9 days | Linux 3.2.0-4-amd64 | +| 11. | sirius | 0 years, 8 months, 20 days | Linux 2.6.32-042stab111.12 | +| 12. | *f0 | 0 years, 8 months, 3 days | FreeBSD 14.3-RELEASE | +| 13. | *f2 | 0 years, 8 months, 2 days | FreeBSD 14.3-RELEASE | +| 14. | *f1 | 0 years, 8 months, 1 days | FreeBSD 14.3-RELEASE | +| 15. | *earth | 0 years, 6 months, 29 days | Linux 6.17.12-300.fc43.x86_64 | +| 16. | deimos | 0 years, 5 months, 15 days | Linux 4.4.5-300.fc23.x86_64 | +| 17. | joghurt | 0 years, 2 months, 9 days | FreeBSD 7.0-PRERELEASE | +| 18. | host0 | 0 years, 2 months, 1 days | FreeBSD 6.2-RELEASE-p5 | +| 19. | *mega-m3-pro | 0 years, 1 months, 20 days | Darwin 24.6.0 | +| 20. | fibonacci | 0 years, 1 months, 11 days | FreeBSD 5.3-RELEASE-p15 | ++-----+----------------+-----------------------------+-------------------------------+ ``` ## Top 20 Lifespan's by Host @@ -149,7 +149,7 @@ Lifespan is the total uptime + the total downtime of a host. | 2. | dionysus | 8 years, 6 months, 17 days | FreeBSD 13.0-RELEASE-p11 | | 3. | alphacentauri | 6 years, 9 months, 13 days | FreeBSD 11.4-RELEASE-p7 | | 4. | *makemake | 4 years, 10 months, 16 days | Linux 6.9.9-200.fc40.x86_64 | -| 5. | *earth | 4 years, 5 months, 14 days | Linux 6.17.8-300.fc43.x86_64 | +| 5. | *earth | 4 years, 5 months, 29 days | Linux 6.17.12-300.fc43.x86_64 | | 6. | vulcan | 4 years, 5 months, 6 days | Linux 3.10.0-1160.81.1.el7.x86_64 | | 7. | *blowfish | 3 years, 10 months, 3 days | OpenBSD 7.7 | | 8. | sun | 3 years, 10 months, 2 days | FreeBSD 10.3-RELEASE-p24 | @@ -179,7 +179,7 @@ Boots is the total number of host boots over the entire lifespan. | 1. | FreeBSD 10... | 551 | | 2. | Linux 3... | 550 | | 3. | *FreeBSD 14... | 215 | -| 4. | *Linux 6... | 198 | +| 4. | *Linux 6... | 202 | | 5. | Linux 5... | 162 | | 6. | Linux 4... | 161 | | 7. | FreeBSD 11... | 153 | @@ -194,8 +194,8 @@ Boots is the total number of host boots over the entire lifespan. | 16. | Darwin 15... | 15 | | 17. | Darwin 22... | 12 | | 18. | Darwin 18... | 11 | -| 19. | OpenBSD 4... | 10 | -| 20. | FreeBSD 6... | 10 | +| 19. | FreeBSD 6... | 10 | +| 20. | OpenBSD 4... | 10 | +-----+----------------+-------+ ``` @@ -211,7 +211,7 @@ Uptime is the total uptime of a host over the entire lifespan. | 2. | *OpenBSD 7... | 7 years, 6 months, 29 days | | 3. | FreeBSD 10... | 5 years, 9 months, 9 days | | 4. | Linux 5... | 4 years, 10 months, 21 days | -| 5. | *Linux 6... | 3 years, 2 months, 13 days | +| 5. | *Linux 6... | 3 years, 2 months, 28 days | | 6. | Linux 4... | 2 years, 7 months, 22 days | | 7. | FreeBSD 11... | 2 years, 4 months, 28 days | | 8. | *FreeBSD 14... | 2 years, 3 months, 24 days | @@ -219,7 +219,7 @@ Uptime is the total uptime of a host over the entire lifespan. | 10. | Darwin 13... | 1 years, 3 months, 25 days | | 11. | FreeBSD 6... | 1 years, 3 months, 9 days | | 12. | Darwin 23... | 0 years, 11 months, 7 days | -| 13. | *Darwin 24... | 0 years, 10 months, 23 days | +| 13. | *Darwin 24... | 0 years, 11 months, 6 days | | 14. | OpenBSD 4... | 0 years, 8 months, 12 days | | 15. | Darwin 21... | 0 years, 8 months, 2 days | | 16. | Darwin 18... | 0 years, 7 months, 5 days | @@ -242,22 +242,22 @@ Score is calculated by combining all other metrics. | 2. | *OpenBSD 7... | 484 | | 3. | FreeBSD 10... | 406 | | 4. | Linux 5... | 317 | -| 5. | *Linux 6... | 216 | +| 5. | *Linux 6... | 219 | | 6. | Linux 4... | 175 | | 7. | *FreeBSD 14... | 161 | | 8. | FreeBSD 11... | 159 | | 9. | Linux 2... | 121 | | 10. | Darwin 13... | 80 | | 11. | FreeBSD 6... | 75 | -| 12. | Darwin 23... | 56 | -| 13. | *Darwin 24... | 55 | +| 12. | *Darwin 24... | 58 | +| 13. | Darwin 23... | 56 | | 14. | OpenBSD 4... | 39 | | 15. | Darwin 21... | 38 | | 16. | Darwin 18... | 32 | | 17. | Darwin 22... | 30 | | 18. | Darwin 15... | 29 | -| 19. | FreeBSD 13... | 25 | -| 20. | FreeBSD 5... | 25 | +| 19. | FreeBSD 5... | 25 | +| 20. | FreeBSD 13... | 25 | +-----+----------------+-------+ ``` @@ -269,7 +269,7 @@ Boots is the total number of host boots over the entire lifespan. +-----+------------+-------+ | Pos | KernelName | Boots | +-----+------------+-------+ -| 1. | *Linux | 1093 | +| 1. | *Linux | 1097 | | 2. | *FreeBSD | 1080 | | 3. | *Darwin | 155 | | 4. | *OpenBSD | 109 | @@ -285,10 +285,10 @@ Uptime is the total uptime of a host over the entire lifespan. +-----+------------+-----------------------------+ | Pos | KernelName | Uptime | +-----+------------+-----------------------------+ -| 1. | *Linux | 28 years, 3 months, 8 days | +| 1. | *Linux | 28 years, 3 months, 23 days | | 2. | *FreeBSD | 12 years, 2 months, 24 days | | 3. | *OpenBSD | 8 years, 2 months, 7 days | -| 4. | *Darwin | 5 years, 2 months, 8 days | +| 4. | *Darwin | 5 years, 2 months, 22 days | | 5. | NetBSD | 0 years, 1 months, 1 days | +-----+------------+-----------------------------+ ``` @@ -301,10 +301,10 @@ Score is calculated by combining all other metrics. +-----+------------+-------+ | Pos | KernelName | Score | +-----+------------+-------+ -| 1. | *Linux | 1875 | +| 1. | *Linux | 1878 | | 2. | *FreeBSD | 862 | | 3. | *OpenBSD | 523 | -| 4. | *Darwin | 338 | +| 4. | *Darwin | 340 | | 5. | NetBSD | 0 | +-----+------------+-------+ ``` |
