summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorPaul Buetow <paul@buetow.org>2026-03-09 09:06:47 +0200
committerPaul Buetow <paul@buetow.org>2026-03-09 09:06:47 +0200
commitb67012e55e52f69897559a084b4588a5649b3c5c (patch)
treecc4aa9b4142cbf5c04e647f43e75577352113743
parent1f189ccdb1d9dfb4f3517f64e766870de0c1d00c (diff)
Update content for html
-rw-r--r--gemfeed/2025-12-07-f3s-kubernetes-with-freebsd-part-8.html1037
-rw-r--r--gemfeed/DRAFT-f3s-kubernetes-with-freebsd-part-8b.html1164
-rw-r--r--gemfeed/atom.xml1313
-rw-r--r--gemfeed/f3s-kubernetes-with-freebsd-part-8/grafana-etcd-dashboard.pngbin0 -> 201310 bytes
-rw-r--r--gemfeed/f3s-kubernetes-with-freebsd-part-8/grafana-zfs-arc-stats.pngbin0 -> 168537 bytes
-rw-r--r--gemfeed/f3s-kubernetes-with-freebsd-part-8/grafana-zfs-dashboard.pngbin0 -> 210342 bytes
-rw-r--r--gemfeed/f3s-kubernetes-with-freebsd-part-8/grafana-zfs-datasets.pngbin0 -> 149339 bytes
7 files changed, 2192 insertions, 1322 deletions
diff --git a/gemfeed/2025-12-07-f3s-kubernetes-with-freebsd-part-8.html b/gemfeed/2025-12-07-f3s-kubernetes-with-freebsd-part-8.html
index 53bc16d9..0cda1b53 100644
--- a/gemfeed/2025-12-07-f3s-kubernetes-with-freebsd-part-8.html
+++ b/gemfeed/2025-12-07-f3s-kubernetes-with-freebsd-part-8.html
@@ -18,7 +18,7 @@
</p>
<h1 style='display: inline' id='f3s-kubernetes-with-freebsd---part-8-observability'>f3s: Kubernetes with FreeBSD - Part 8: Observability</h1><br />
<br />
-<span class='quote'>Published at 2025-12-06T23:58:24+02:00</span><br />
+<span class='quote'>Published at 2025-12-06T23:58:24+02:00, last updated Mon 09 Mar 09:33:08 EET 2026</span><br />
<br />
<span>This is the 8th blog post about the f3s series for my self-hosting demands in a home lab. f3s? The "f" stands for FreeBSD, and the "3s" stands for k3s, the Kubernetes distribution I use on FreeBSD-based physical machines.</span><br />
<br />
@@ -60,23 +60,56 @@
<li>⇢ ⇢ <a href='#adding-freebsd-hosts-to-prometheus'>Adding FreeBSD hosts to Prometheus</a></li>
<li>⇢ ⇢ <a href='#freebsd-memory-metrics-compatibility'>FreeBSD memory metrics compatibility</a></li>
<li>⇢ ⇢ <a href='#disk-io-metrics-limitation'>Disk I/O metrics limitation</a></li>
+<li>⇢ <a href='#zfs-monitoring-for-freebsd-servers'>ZFS Monitoring for FreeBSD Servers</a></li>
+<li>⇢ ⇢ <a href='#node-exporter-zfs-collector'>Node Exporter ZFS Collector</a></li>
+<li>⇢ ⇢ <a href='#verifying-zfs-metrics'>Verifying ZFS Metrics</a></li>
+<li>⇢ ⇢ <a href='#zfs-recording-rules'>ZFS Recording Rules</a></li>
+<li>⇢ ⇢ <a href='#grafana-dashboards'>Grafana Dashboards</a></li>
+<li>⇢ ⇢ <a href='#deployment'>Deployment</a></li>
+<li>⇢ ⇢ <a href='#verifying-zfs-metrics-in-prometheus'>Verifying ZFS Metrics in Prometheus</a></li>
+<li>⇢ ⇢ <a href='#key-metrics-to-monitor'>Key Metrics to Monitor</a></li>
+<li>⇢ ⇢ <a href='#zfs-pool-and-dataset-metrics-via-textfile-collector'>ZFS Pool and Dataset Metrics via Textfile Collector</a></li>
<li>⇢ <a href='#monitoring-external-openbsd-hosts'>Monitoring external OpenBSD hosts</a></li>
<li>⇢ ⇢ <a href='#installing-node-exporter-on-openbsd'>Installing Node Exporter on OpenBSD</a></li>
<li>⇢ ⇢ <a href='#adding-openbsd-hosts-to-prometheus'>Adding OpenBSD hosts to Prometheus</a></li>
<li>⇢ ⇢ <a href='#openbsd-memory-metrics-compatibility'>OpenBSD memory metrics compatibility</a></li>
+<li>⇢ <a href='#distributed-tracing-with-grafana-tempo'>Distributed Tracing with Grafana Tempo</a></li>
+<li>⇢ ⇢ <a href='#why-distributed-tracing'>Why Distributed Tracing?</a></li>
+<li>⇢ ⇢ <a href='#deploying-grafana-tempo'>Deploying Grafana Tempo</a></li>
+<li>⇢ <a href='#-configuration-strategy'>⇢# Configuration Strategy</a></li>
+<li>⇢ <a href='#-tempo-deployment-files'>⇢# Tempo Deployment Files</a></li>
+<li>⇢ <a href='#-installation'>⇢# Installation</a></li>
+<li>⇢ ⇢ <a href='#configuring-grafana-alloy-for-trace-collection'>Configuring Grafana Alloy for Trace Collection</a></li>
+<li>⇢ <a href='#-otlp-receiver-configuration'>⇢# OTLP Receiver Configuration</a></li>
+<li>⇢ <a href='#-upgrade-alloy'>⇢# Upgrade Alloy</a></li>
+<li>⇢ ⇢ <a href='#demo-tracing-application'>Demo Tracing Application</a></li>
+<li>⇢ <a href='#-application-architecture'>⇢# Application Architecture</a></li>
+<li>⇢ ⇢ <a href='#visualizing-traces-in-grafana'>Visualizing Traces in Grafana</a></li>
+<li>⇢ <a href='#-accessing-traces'>⇢# Accessing Traces</a></li>
+<li>⇢ <a href='#-service-graph-visualization'>⇢# Service Graph Visualization</a></li>
+<li>⇢ ⇢ <a href='#correlation-between-observability-signals'>Correlation Between Observability Signals</a></li>
+<li>⇢ <a href='#-traces-to-logs'>⇢# Traces-to-Logs</a></li>
+<li>⇢ <a href='#-traces-to-metrics'>⇢# Traces-to-Metrics</a></li>
+<li>⇢ <a href='#-logs-to-traces'>⇢# Logs-to-Traces</a></li>
+<li>⇢ ⇢ <a href='#generating-traces-for-testing'>Generating Traces for Testing</a></li>
+<li>⇢ ⇢ <a href='#verifying-the-complete-pipeline'>Verifying the Complete Pipeline</a></li>
+<li>⇢ ⇢ <a href='#practical-example-viewing-a-distributed-trace'>Practical Example: Viewing a Distributed Trace</a></li>
+<li>⇢ ⇢ <a href='#storage-and-retention'>Storage and Retention</a></li>
+<li>⇢ ⇢ <a href='#configuration-files'>Configuration Files</a></li>
<li>⇢ <a href='#summary'>Summary</a></li>
</ul><br />
<h2 style='display: inline' id='introduction'>Introduction</h2><br />
<br />
-<span>In this blog post, I set up a complete observability stack for the k3s cluster. Observability is crucial for understanding what&#39;s happening inside the cluster—whether its tracking resource usage, debugging issues, or analysing application behaviour. The stack consists of four main components, all deployed into the <span class='inlinecode'>monitoring</span> namespace:</span><br />
+<span>In this blog post, I set up a complete observability stack for the k3s cluster. Observability is crucial for understanding what&#39;s happening inside the cluster—whether its tracking resource usage, debugging issues, or analysing application behaviour. The stack consists of five main components, all deployed into the <span class='inlinecode'>monitoring</span> namespace:</span><br />
<br />
<ul>
<li>Prometheus: time-series database for metrics collection and alerting</li>
<li>Grafana: visualisation and dashboarding frontend</li>
<li>Loki: log aggregation system (like Prometheus, but for logs)</li>
-<li>Alloy: telemetry collector that ships logs from all pods to Loki</li>
+<li>Alloy: telemetry collector that ships logs and traces from all pods to Loki and Tempo</li>
+<li>Tempo: distributed tracing backend for request flow analysis across microservices</li>
</ul><br />
-<span>Together, these form the "PLG" stack (Prometheus, Loki, Grafana), which is a popular open-source alternative to commercial observability platforms.</span><br />
+<span>Together, these form the "PLG" stack (Prometheus, Loki, Grafana) extended with Tempo for distributed tracing, which is a popular open-source alternative to commercial observability platforms.</span><br />
<br />
<span>All manifests for the f3s stack live in my configuration repository:</span><br />
<br />
@@ -120,6 +153,7 @@ http://www.gnu.org/software/src-highlite -->
<li><span class='inlinecode'>/data/nfs/k3svolumes/prometheus/data</span> — Prometheus time-series database</li>
<li><span class='inlinecode'>/data/nfs/k3svolumes/grafana/data</span> — Grafana configuration, dashboards, and plugins</li>
<li><span class='inlinecode'>/data/nfs/k3svolumes/loki/data</span> — Loki log chunks and index</li>
+<li><span class='inlinecode'>/data/nfs/k3svolumes/tempo/data</span> — Tempo trace data and WAL</li>
</ul><br />
<span>Each path gets a corresponding <span class='inlinecode'>PersistentVolume</span> and <span class='inlinecode'>PersistentVolumeClaim</span> in Kubernetes, allowing pods to mount them as regular volumes. Because the underlying storage is ZFS with replication, we get snapshots and redundancy for free.</span><br />
<br />
@@ -218,7 +252,7 @@ kubeControllerManager:
insecureSkipVerify: true
</pre>
<br />
-<span>By default, k3s binds the controller-manager to localhost only, so the "Kubernetes / Controller Manager" dashboard in Grafana will show no data. To expose the metrics endpoint, add the following to <span class='inlinecode'>/etc/rancher/k3s/config.yaml</span> on each k3s server node:</span><br />
+<span>By default, k3s binds the controller-manager to localhost only and doesn&#39;t expose etcd metrics, so the "Kubernetes / Controller Manager" and "etcd" dashboards in Grafana will show no data. To fix both, add the following to <span class='inlinecode'>/etc/rancher/k3s/config.yaml</span> on each k3s server node:</span><br />
<br />
<!-- Generator: GNU source-highlight 3.1.9
by Lorenzo Bettini
@@ -227,11 +261,26 @@ http://www.gnu.org/software/src-highlite -->
<pre><font color="#F3E651">[</font><font color="#ff0000">root@r0 </font><font color="#F3E651">~]</font><i><font color="#ababab"># cat &gt;&gt; /etc/rancher/k3s/config.yaml &lt;&lt; 'EOF'</font></i>
<font color="#ff0000">kube-controller-manager-arg</font><font color="#F3E651">:</font>
<font color="#ff0000"> - bind-address</font><font color="#F3E651">=</font><font color="#bb00ff">0.0</font><font color="#F3E651">.</font><font color="#bb00ff">0.0</font>
+<font color="#ff0000">etcd-expose-metrics</font><font color="#F3E651">:</font><font color="#ff0000"> </font><b><font color="#ffffff">true</font></b>
<font color="#ff0000">EOF</font>
<font color="#F3E651">[</font><font color="#ff0000">root@r0 </font><font color="#F3E651">~]</font><i><font color="#ababab"># systemctl restart k3s</font></i>
</pre>
<br />
-<span>Repeat for <span class='inlinecode'>r1</span> and <span class='inlinecode'>r2</span>. After restarting all nodes, the controller-manager metrics endpoint will be accessible and Prometheus can scrape it.</span><br />
+<span>Repeat for <span class='inlinecode'>r1</span> and <span class='inlinecode'>r2</span>. After restarting all nodes, the controller-manager metrics endpoint will be accessible and etcd metrics are available on port 2381. Prometheus can now scrape both.</span><br />
+<br />
+<span>Verify etcd metrics are exposed:</span><br />
+<br />
+<!-- Generator: GNU source-highlight 3.1.9
+by Lorenzo Bettini
+http://www.lorenzobettini.it
+http://www.gnu.org/software/src-highlite -->
+<pre><font color="#F3E651">[</font><font color="#ff0000">root@r0 </font><font color="#F3E651">~]</font><i><font color="#ababab"># curl -s http://127.0.0.1:2381/metrics | grep etcd_server_has_leader</font></i>
+<font color="#ff0000">etcd_server_has_leader </font><font color="#bb00ff">1</font>
+</pre>
+<br />
+<span>The full <span class='inlinecode'>persistence-values.yaml</span> and all other Prometheus configuration files are available on Codeberg:</span><br />
+<br />
+<a class='textlink' href='https://codeberg.org/snonux/conf/src/branch/master/f3s/prometheus'>codeberg.org/snonux/conf/f3s/prometheus</a><br />
<br />
<span>The persistent volume definitions bind to specific paths on the NFS share using <span class='inlinecode'>hostPath</span> volumes—the same pattern used for other services in Part 7:</span><br />
<br />
@@ -258,6 +307,8 @@ http://www.gnu.org/software/src-highlite -->
<br />
<a href='./f3s-kubernetes-with-freebsd-part-8/grafana-dashboard.png'><img alt='Grafana dashboard showing cluster metrics' title='Grafana dashboard showing cluster metrics' src='./f3s-kubernetes-with-freebsd-part-8/grafana-dashboard.png' /></a><br />
<br />
+<a href='./f3s-kubernetes-with-freebsd-part-8/grafana-etcd-dashboard.png'><img alt='Grafana etcd dashboard showing cluster health, RPC rate, disk sync duration, and peer round trip times' title='Grafana etcd dashboard showing cluster health, RPC rate, disk sync duration, and peer round trip times' src='./f3s-kubernetes-with-freebsd-part-8/grafana-etcd-dashboard.png' /></a><br />
+<br />
<h2 style='display: inline' id='installing-loki-and-alloy'>Installing Loki and Alloy</h2><br />
<br />
<span>While Prometheus handles metrics, Loki handles logs. It&#39;s designed to be cost-effective and easy to operate—it doesn&#39;t index the contents of logs, only the metadata (labels), making it very efficient for storage.</span><br />
@@ -409,8 +460,11 @@ http://www.gnu.org/software/src-highlite -->
<font color="#ff0000">prometheus-prometheus-node-exporter-2nsg9 </font><font color="#bb00ff">1</font><font color="#F3E651">/</font><font color="#bb00ff">1</font><font color="#ff0000"> Running </font><font color="#bb00ff">0</font><font color="#ff0000"> 42d</font>
<font color="#ff0000">prometheus-prometheus-node-exporter-mqr</font><font color="#bb00ff">25</font><font color="#ff0000"> </font><font color="#bb00ff">1</font><font color="#F3E651">/</font><font color="#bb00ff">1</font><font color="#ff0000"> Running </font><font color="#bb00ff">0</font><font color="#ff0000"> 42d</font>
<font color="#ff0000">prometheus-prometheus-node-exporter-wp4ds </font><font color="#bb00ff">1</font><font color="#F3E651">/</font><font color="#bb00ff">1</font><font color="#ff0000"> Running </font><font color="#bb00ff">0</font><font color="#ff0000"> 42d</font>
+<font color="#ff0000">tempo-</font><font color="#bb00ff">0</font><font color="#ff0000"> </font><font color="#bb00ff">1</font><font color="#F3E651">/</font><font color="#bb00ff">1</font><font color="#ff0000"> Running </font><font color="#bb00ff">0</font><font color="#ff0000"> 1d</font>
</pre>
<br />
+<span>Note: Tempo (<span class='inlinecode'>tempo-0</span>) is deployed later in this post in the "Distributed Tracing with Grafana Tempo" section. It is included in the pod listing here for completeness.</span><br />
+<br />
<span>And the services:</span><br />
<br />
<!-- Generator: GNU source-highlight 3.1.9
@@ -429,6 +483,7 @@ http://www.gnu.org/software/src-highlite -->
<font color="#ff0000">prometheus-kube-prometheus-prometheus ClusterIP </font><font color="#bb00ff">10.43</font><font color="#F3E651">.</font><font color="#bb00ff">152.163</font><font color="#ff0000"> </font><font color="#bb00ff">9090</font><font color="#ff0000">/TCP</font><font color="#F3E651">,</font><font color="#bb00ff">8080</font><font color="#ff0000">/TCP</font>
<font color="#ff0000">prometheus-kube-state-metrics ClusterIP </font><font color="#bb00ff">10.43</font><font color="#F3E651">.</font><font color="#bb00ff">64.26</font><font color="#ff0000"> </font><font color="#bb00ff">8080</font><font color="#ff0000">/TCP</font>
<font color="#ff0000">prometheus-prometheus-node-exporter ClusterIP </font><font color="#bb00ff">10.43</font><font color="#F3E651">.</font><font color="#bb00ff">127.242</font><font color="#ff0000"> </font><font color="#bb00ff">9100</font><font color="#ff0000">/TCP</font>
+<font color="#ff0000">tempo ClusterIP </font><font color="#bb00ff">10.43</font><font color="#F3E651">.</font><font color="#bb00ff">91.44</font><font color="#ff0000"> </font><font color="#bb00ff">3200</font><font color="#ff0000">/TCP</font><font color="#F3E651">,</font><font color="#bb00ff">4317</font><font color="#ff0000">/TCP</font><font color="#F3E651">,</font><font color="#bb00ff">4318</font><font color="#ff0000">/TCP</font>
</pre>
<br />
<span>Let me break down what each pod does:</span><br />
@@ -457,6 +512,9 @@ http://www.gnu.org/software/src-highlite -->
<ul>
<li><span class='inlinecode'>prometheus-prometheus-node-exporter-...</span>: three Node Exporter pods running as a DaemonSet, one on each node. They expose hardware and OS-level metrics: CPU usage, memory, disk I/O, filesystem usage, network statistics, and more. These feed the "Node Exporter" dashboards in Grafana.</li>
</ul><br />
+<ul>
+<li><span class='inlinecode'>tempo-0</span>: the Grafana Tempo instance for distributed tracing. It receives trace data from Alloy via OTLP (OpenTelemetry Protocol), stores traces on the NFS-backed persistent volume, and serves queries to Grafana. Tempo is covered in detail in the "Distributed Tracing with Grafana Tempo" section later in this post.</li>
+</ul><br />
<h2 style='display: inline' id='using-the-observability-stack'>Using the observability stack</h2><br />
<br />
<h3 style='display: inline' id='viewing-metrics-in-grafana'>Viewing metrics in Grafana</h3><br />
@@ -642,7 +700,313 @@ spec:
<br />
<span>Unlike memory metrics, disk I/O metrics (<span class='inlinecode'>node_disk_read_bytes_total</span>, <span class='inlinecode'>node_disk_written_bytes_total</span>, etc.) are not available on FreeBSD. The Linux diskstats collector that provides these metrics doesn&#39;t have a FreeBSD equivalent in the node_exporter.</span><br />
<br />
-<span>The disk I/O panels in the Node Exporter dashboards will show "No data" for FreeBSD hosts. FreeBSD does expose ZFS-specific metrics (<span class='inlinecode'>node_zfs_arcstats_*</span>) for ARC cache performance, and per-dataset I/O stats are available via <span class='inlinecode'>sysctl kstat.zfs</span>, but mapping these to the Linux-style metrics the dashboards expect is non-trivial. Creating custom ZFS-specific dashboards is left as an exercise for another day.</span><br />
+<span>The disk I/O panels in the Node Exporter dashboards will show "No data" for FreeBSD hosts. FreeBSD does expose ZFS-specific metrics (<span class='inlinecode'>node_zfs_arcstats_*</span>) for ARC cache performance, and per-dataset I/O stats are available via <span class='inlinecode'>sysctl kstat.zfs</span>, but mapping these to the Linux-style metrics the dashboards expect is non-trivial. To address this, I created custom ZFS-specific dashboards, covered in the next section.</span><br />
+<br />
+<h2 style='display: inline' id='zfs-monitoring-for-freebsd-servers'>ZFS Monitoring for FreeBSD Servers</h2><br />
+<br />
+<span>The FreeBSD servers (f0, f1, f2) that provide NFS storage to the k3s cluster have ZFS filesystems. Monitoring ZFS performance is crucial for understanding storage performance and cache efficiency.</span><br />
+<br />
+<h3 style='display: inline' id='node-exporter-zfs-collector'>Node Exporter ZFS Collector</h3><br />
+<br />
+<span>The node_exporter running on each FreeBSD server (v1.9.1) includes a built-in ZFS collector that exposes metrics via sysctls. The ZFS collector is enabled by default and provides:</span><br />
+<br />
+<ul>
+<li>ARC (Adaptive Replacement Cache) statistics</li>
+<li>Cache hit/miss rates</li>
+<li>Memory usage and allocation</li>
+<li>MRU/MFU cache breakdown</li>
+<li>Data vs metadata distribution</li>
+</ul><br />
+<h3 style='display: inline' id='verifying-zfs-metrics'>Verifying ZFS Metrics</h3><br />
+<br />
+<span>On any FreeBSD server, check that ZFS metrics are being exposed:</span><br />
+<br />
+<pre>
+paul@f0:~ % curl -s http://localhost:9100/metrics | grep node_zfs_arcstats | wc -l
+ 69
+</pre>
+<br />
+<span>The metrics are automatically scraped by Prometheus through the existing static configuration in additional-scrape-configs.yaml which targets all FreeBSD servers on port 9100 with the os: freebsd label.</span><br />
+<br />
+<h3 style='display: inline' id='zfs-recording-rules'>ZFS Recording Rules</h3><br />
+<br />
+<span>Created recording rules for easier dashboard consumption in zfs-recording-rules.yaml:</span><br />
+<br />
+<pre>
+apiVersion: monitoring.coreos.com/v1
+kind: PrometheusRule
+metadata:
+ name: freebsd-zfs-rules
+ namespace: monitoring
+ labels:
+ release: prometheus
+spec:
+ groups:
+ - name: freebsd-zfs-arc
+ interval: 30s
+ rules:
+ - record: node_zfs_arc_hit_rate_percent
+ expr: |
+ 100 * (
+ rate(node_zfs_arcstats_hits_total{os="freebsd"}[5m]) /
+ (rate(node_zfs_arcstats_hits_total{os="freebsd"}[5m]) +
+ rate(node_zfs_arcstats_misses_total{os="freebsd"}[5m]))
+ )
+ labels:
+ os: freebsd
+ - record: node_zfs_arc_memory_usage_percent
+ expr: |
+ 100 * (
+ node_zfs_arcstats_size_bytes{os="freebsd"} /
+ node_zfs_arcstats_c_max_bytes{os="freebsd"}
+ )
+ labels:
+ os: freebsd
+ # Additional rules for metadata %, target %, MRU/MFU %, etc.
+</pre>
+<br />
+<span>These recording rules calculate:</span><br />
+<br />
+<ul>
+<li>ARC hit rate percentage</li>
+<li>ARC memory usage percentage (current vs maximum)</li>
+<li>ARC target percentage (target vs maximum)</li>
+<li>Metadata vs data percentages</li>
+<li>MRU vs MFU cache percentages</li>
+<li>Demand data and metadata hit rates</li>
+</ul><br />
+<h3 style='display: inline' id='grafana-dashboards'>Grafana Dashboards</h3><br />
+<br />
+<span>Created two comprehensive ZFS monitoring dashboards (zfs-dashboards.yaml):</span><br />
+<br />
+<span>**Dashboard 1: FreeBSD ZFS (per-host detailed view)**</span><br />
+<br />
+<span>Includes variables to select:</span><br />
+<br />
+<ul>
+<li>FreeBSD server (f0, f1, or f2)</li>
+<li>ZFS pool (zdata, zroot, or all)</li>
+</ul><br />
+<span>Pool Overview Row:</span><br />
+<br />
+<ul>
+<li>Pool Capacity gauge (with thresholds: green &lt;70%, yellow &lt;85%, red &gt;85%)</li>
+<li>Pool Health status (ONLINE/DEGRADED/FAULTED with color coding)</li>
+<li>Total Pool Size stat</li>
+<li>Free Space stat</li>
+<li>Pool Space Usage Over Time (stacked: used + free)</li>
+<li>Pool Capacity Trend time series</li>
+</ul><br />
+<span>Dataset Statistics Row:</span><br />
+<br />
+<ul>
+<li>Table showing all datasets with columns: Pool, Dataset, Used, Available, Referenced</li>
+<li>Automatically filters by selected pool</li>
+</ul><br />
+<span>ARC Cache Statistics Row:</span><br />
+<br />
+<ul>
+<li>ARC Hit Rate gauge (red &lt;70%, yellow &lt;90%, green &gt;=90%)</li>
+<li>ARC Size time series (current, target, max)</li>
+<li>ARC Memory Usage percentage gauge</li>
+<li>ARC Hits vs Misses rate</li>
+<li>ARC Data vs Metadata stacked time series</li>
+</ul><br />
+<span>**Dashboard 2: FreeBSD ZFS Summary (cluster-wide overview)**</span><br />
+<br />
+<span>Cluster-Wide Pool Statistics Row:</span><br />
+<br />
+<ul>
+<li>Total Storage Capacity across all servers</li>
+<li>Total Used space</li>
+<li>Total Free space</li>
+<li>Average Pool Capacity gauge</li>
+<li>Pool Health Status (worst case across cluster)</li>
+<li>Total Pool Space Usage Over Time</li>
+<li>Per-Pool Capacity time series (all pools on all hosts)</li>
+</ul><br />
+<span>Per-Host Pool Breakdown Row:</span><br />
+<br />
+<ul>
+<li>Bar gauge showing capacity by host and pool</li>
+<li>Table with all pools: Host, Pool, Size, Used, Free, Capacity %, Health</li>
+</ul><br />
+<span>Cluster-Wide ARC Statistics Row:</span><br />
+<br />
+<ul>
+<li>Average ARC Hit Rate gauge across all hosts</li>
+<li>ARC Hit Rate by Host time series</li>
+<li>Total ARC Size Across Cluster</li>
+<li>Total ARC Hits vs Misses (cluster-wide sum)</li>
+<li>ARC Size by Host</li>
+</ul><br />
+<span>Dashboard Visualization:</span><br />
+<br />
+<a href='./f3s-kubernetes-with-freebsd-part-8/grafana-zfs-dashboard.png'><img alt='ZFS monitoring dashboard in Grafana showing pool capacity, health, and I/O throughput' title='ZFS monitoring dashboard in Grafana showing pool capacity, health, and I/O throughput' src='./f3s-kubernetes-with-freebsd-part-8/grafana-zfs-dashboard.png' /></a><br />
+<a href='./f3s-kubernetes-with-freebsd-part-8/grafana-zfs-arc-stats.png'><img alt='ZFS ARC cache statistics showing hit rate, memory usage, and size trends' title='ZFS ARC cache statistics showing hit rate, memory usage, and size trends' src='./f3s-kubernetes-with-freebsd-part-8/grafana-zfs-arc-stats.png' /></a><br />
+<a href='./f3s-kubernetes-with-freebsd-part-8/grafana-zfs-datasets.png'><img alt='ZFS datasets table and ARC data vs metadata breakdown' title='ZFS datasets table and ARC data vs metadata breakdown' src='./f3s-kubernetes-with-freebsd-part-8/grafana-zfs-datasets.png' /></a><br />
+<br />
+<h3 style='display: inline' id='deployment'>Deployment</h3><br />
+<br />
+<span>Applied the resources to the cluster:</span><br />
+<br />
+<pre>
+cd /home/paul/git/conf/f3s/prometheus
+kubectl apply -f zfs-recording-rules.yaml
+kubectl apply -f zfs-dashboards.yaml
+</pre>
+<br />
+<span>Updated Justfile to include ZFS recording rules in install and upgrade targets:</span><br />
+<br />
+<pre>
+install:
+ kubectl apply -f persistent-volumes.yaml
+ kubectl create secret generic additional-scrape-configs --from-file=additional-scrape-configs.yaml -n monitoring --dry-run=client -o yaml | kubectl apply -f -
+ helm install prometheus prometheus-community/kube-prometheus-stack --namespace monitoring -f persistence-values.yaml
+ kubectl apply -f freebsd-recording-rules.yaml
+ kubectl apply -f openbsd-recording-rules.yaml
+ kubectl apply -f zfs-recording-rules.yaml
+ just -f grafana-ingress/Justfile install
+</pre>
+<br />
+<h3 style='display: inline' id='verifying-zfs-metrics-in-prometheus'>Verifying ZFS Metrics in Prometheus</h3><br />
+<br />
+<span>Check that ZFS metrics are being collected:</span><br />
+<br />
+<pre>
+kubectl exec -n monitoring prometheus-prometheus-kube-prometheus-prometheus-0 -c prometheus -- \
+ wget -qO- &#39;http://localhost:9090/api/v1/query?query=node_zfs_arcstats_size_bytes&#39;
+</pre>
+<br />
+<span>Check recording rules are calculating correctly:</span><br />
+<br />
+<pre>
+kubectl exec -n monitoring prometheus-prometheus-kube-prometheus-prometheus-0 -c prometheus -- \
+ wget -qO- &#39;http://localhost:9090/api/v1/query?query=node_zfs_arc_memory_usage_percent&#39;
+</pre>
+<br />
+<span>Example output shows memory usage percentage for each FreeBSD server:</span><br />
+<br />
+<pre>
+"result":[
+ {"metric":{"instance":"192.168.2.130:9100","os":"freebsd"},"value":[...,"37.58"]},
+ {"metric":{"instance":"192.168.2.131:9100","os":"freebsd"},"value":[...,"12.85"]},
+ {"metric":{"instance":"192.168.2.132:9100","os":"freebsd"},"value":[...,"13.44"]}
+]
+</pre>
+<br />
+<h3 style='display: inline' id='key-metrics-to-monitor'>Key Metrics to Monitor</h3><br />
+<br />
+<ul>
+<li>ARC Hit Rate: Should typically be above 90% for optimal performance. Lower hit rates indicate the ARC cache is too small or workload has poor locality.</li>
+<li>ARC Memory Usage: Shows how much of the maximum ARC size is being used. If consistently at or near maximum, the ARC is effectively utilizing available memory.</li>
+<li>Data vs Metadata: Typically data should dominate, but workloads with many small files will show higher metadata percentages.</li>
+<li>MRU vs MFU: Most Recently Used vs Most Frequently Used cache. The ratio depends on workload characteristics.</li>
+<li>Pool Capacity: Monitor pool usage to ensure adequate free space. ZFS performance degrades when pools exceed 80% capacity.</li>
+<li>Pool Health: Should always show ONLINE (green). DEGRADED (yellow) indicates a disk issue requiring attention. FAULTED (red) requires immediate action.</li>
+<li>Dataset Usage: Track which datasets are consuming the most space to identify growth trends and plan capacity.</li>
+</ul><br />
+<h3 style='display: inline' id='zfs-pool-and-dataset-metrics-via-textfile-collector'>ZFS Pool and Dataset Metrics via Textfile Collector</h3><br />
+<br />
+<span>To complement the ARC statistics from node_exporter&#39;s built-in ZFS collector, I added pool capacity and dataset metrics using the textfile collector feature.</span><br />
+<br />
+<span>Created a script at <span class='inlinecode'>/usr/local/bin/zfs_pool_metrics.sh</span> on each FreeBSD server:</span><br />
+<br />
+<pre>
+#!/bin/sh
+# ZFS Pool and Dataset Metrics Collector for Prometheus
+
+OUTPUT_FILE="/var/tmp/node_exporter/zfs_pools.prom.$$"
+FINAL_FILE="/var/tmp/node_exporter/zfs_pools.prom"
+
+mkdir -p /var/tmp/node_exporter
+
+{
+ # Pool metrics
+ echo "# HELP zfs_pool_size_bytes Total size of ZFS pool"
+ echo "# TYPE zfs_pool_size_bytes gauge"
+ echo "# HELP zfs_pool_allocated_bytes Allocated space in ZFS pool"
+ echo "# TYPE zfs_pool_allocated_bytes gauge"
+ echo "# HELP zfs_pool_free_bytes Free space in ZFS pool"
+ echo "# TYPE zfs_pool_free_bytes gauge"
+ echo "# HELP zfs_pool_capacity_percent Capacity percentage"
+ echo "# TYPE zfs_pool_capacity_percent gauge"
+ echo "# HELP zfs_pool_health Pool health (0=ONLINE, 1=DEGRADED, 2=FAULTED)"
+ echo "# TYPE zfs_pool_health gauge"
+
+ zpool list -Hp -o name,size,allocated,free,capacity,health | \
+ while IFS=$&#39;\t&#39; read name size alloc free cap health; do
+ case "$health" in
+ ONLINE) health_val=0 ;;
+ DEGRADED) health_val=1 ;;
+ FAULTED) health_val=2 ;;
+ *) health_val=6 ;;
+ esac
+ cap_num=$(echo "$cap" | sed &#39;s/%//&#39;)
+
+ echo "zfs_pool_size_bytes{pool=\"$name\"} $size"
+ echo "zfs_pool_allocated_bytes{pool=\"$name\"} $alloc"
+ echo "zfs_pool_free_bytes{pool=\"$name\"} $free"
+ echo "zfs_pool_capacity_percent{pool=\"$name\"} $cap_num"
+ echo "zfs_pool_health{pool=\"$name\"} $health_val"
+ done
+
+ # Dataset metrics
+ echo "# HELP zfs_dataset_used_bytes Used space in dataset"
+ echo "# TYPE zfs_dataset_used_bytes gauge"
+ echo "# HELP zfs_dataset_available_bytes Available space"
+ echo "# TYPE zfs_dataset_available_bytes gauge"
+ echo "# HELP zfs_dataset_referenced_bytes Referenced space"
+ echo "# TYPE zfs_dataset_referenced_bytes gauge"
+
+ zfs list -Hp -t filesystem -o name,used,available,referenced | \
+ while IFS=$&#39;\t&#39; read name used avail ref; do
+ pool=$(echo "$name" | cut -d/ -f1)
+ echo "zfs_dataset_used_bytes{pool=\"$pool\",dataset=\"$name\"} $used"
+ echo "zfs_dataset_available_bytes{pool=\"$pool\",dataset=\"$name\"} $avail"
+ echo "zfs_dataset_referenced_bytes{pool=\"$pool\",dataset=\"$name\"} $ref"
+ done
+} &gt; "$OUTPUT_FILE"
+
+mv "$OUTPUT_FILE" "$FINAL_FILE"
+</pre>
+<br />
+<span>Deployed to all FreeBSD servers:</span><br />
+<br />
+<pre>
+for host in f0 f1 f2; do
+ scp /tmp/zfs_pool_metrics.sh paul@$host:/tmp/
+ ssh paul@$host &#39;doas mv /tmp/zfs_pool_metrics.sh /usr/local/bin/ &amp;&amp; \
+ doas chmod +x /usr/local/bin/zfs_pool_metrics.sh&#39;
+done
+</pre>
+<br />
+<span>Set up cron jobs to run every minute:</span><br />
+<br />
+<pre>
+for host in f0 f1 f2; do
+ ssh paul@$host &#39;echo "* * * * * /usr/local/bin/zfs_pool_metrics.sh &gt;/dev/null 2&gt;&amp;1" | \
+ doas crontab -&#39;
+done
+</pre>
+<br />
+<span>The textfile collector (already configured with --collector.textfile.directory=/var/tmp/node_exporter) automatically picks up the metrics.</span><br />
+<br />
+<span>Verify metrics are being exposed:</span><br />
+<br />
+<pre>
+paul@f0:~ % curl -s http://localhost:9100/metrics | grep "^zfs_pool" | head -5
+zfs_pool_allocated_bytes{pool="zdata"} 6.47622733824e+11
+zfs_pool_allocated_bytes{pool="zroot"} 5.3338578944e+10
+zfs_pool_capacity_percent{pool="zdata"} 64
+zfs_pool_capacity_percent{pool="zroot"} 10
+zfs_pool_free_bytes{pool="zdata"} 3.48809678848e+11
+</pre>
+<br />
+<span>All ZFS-related configuration files are available on Codeberg:</span><br />
+<br />
+<a class='textlink' href='https://codeberg.org/snonux/conf/src/branch/master/f3s/prometheus/zfs-recording-rules.yaml'>zfs-recording-rules.yaml on Codeberg</a><br />
+<a class='textlink' href='https://codeberg.org/snonux/conf/src/branch/master/f3s/prometheus/zfs-dashboards.yaml'>zfs-dashboards.yaml on Codeberg</a><br />
<br />
<h2 style='display: inline' id='monitoring-external-openbsd-hosts'>Monitoring external OpenBSD hosts</h2><br />
<br />
@@ -769,18 +1133,671 @@ spec:
<br />
<span>After running <span class='inlinecode'>just upgrade</span>, the OpenBSD hosts appear in Prometheus targets and the Node Exporter dashboards.</span><br />
<br />
+<h2 style='display: inline' id='distributed-tracing-with-grafana-tempo'>Distributed Tracing with Grafana Tempo</h2><br />
+<br />
+<span>After implementing logs (Loki) and metrics (Prometheus), the final pillar of observability is distributed tracing. Grafana Tempo provides distributed tracing capabilities that help understand request flows across microservices.</span><br />
+<br />
+<span>For a preview of what distributed tracing with Tempo looks like in Grafana, see the X-RAG blog post:</span><br />
+<br />
+<a class='textlink' href='./2025-12-24-x-rag-observability-hackathon.html'>X-RAG Observability Hackathon</a><br />
+<br />
+<h3 style='display: inline' id='why-distributed-tracing'>Why Distributed Tracing?</h3><br />
+<br />
+<span>In a microservices architecture, a single user request may traverse multiple services. Distributed tracing:</span><br />
+<br />
+<ul>
+<li>Tracks requests across service boundaries</li>
+<li>Identifies performance bottlenecks</li>
+<li>Visualizes service dependencies</li>
+<li>Correlates with logs and metrics</li>
+<li>Helps debug complex distributed systems</li>
+</ul><br />
+<h3 style='display: inline' id='deploying-grafana-tempo'>Deploying Grafana Tempo</h3><br />
+<br />
+<span>Tempo is deployed in monolithic mode, following the same pattern as Loki&#39;s SingleBinary deployment.</span><br />
+<br />
+<span>#### Configuration Strategy</span><br />
+<br />
+<span>**Deployment Mode:** Monolithic (all components in one process)</span><br />
+<ul>
+<li>Simpler operation than microservices mode</li>
+<li>Suitable for the cluster scale</li>
+<li>Consistent with Loki deployment pattern</li>
+</ul><br />
+<span>**Storage:** Filesystem backend using hostPath</span><br />
+<ul>
+<li>10Gi storage at /data/nfs/k3svolumes/tempo/data</li>
+<li>7-day retention (168h)</li>
+<li>Local storage is the only option for monolithic mode</li>
+</ul><br />
+<span>**OTLP Receivers:** Standard OpenTelemetry Protocol ports</span><br />
+<ul>
+<li>gRPC: 4317</li>
+<li>HTTP: 4318</li>
+<li>Bind to 0.0.0.0 to avoid Tempo 2.7+ localhost-only binding issue</li>
+</ul><br />
+<span>#### Tempo Deployment Files</span><br />
+<br />
+<span>Created in /home/paul/git/conf/f3s/tempo/:</span><br />
+<br />
+<span>**values.yaml** - Helm chart configuration:</span><br />
+<br />
+<pre>
+tempo:
+ retention: 168h
+ storage:
+ trace:
+ backend: local
+ local:
+ path: /var/tempo/traces
+ wal:
+ path: /var/tempo/wal
+ receivers:
+ otlp:
+ protocols:
+ grpc:
+ endpoint: 0.0.0.0:4317
+ http:
+ endpoint: 0.0.0.0:4318
+
+persistence:
+ enabled: true
+ size: 10Gi
+ storageClassName: ""
+
+resources:
+ limits:
+ cpu: 1000m
+ memory: 2Gi
+ requests:
+ cpu: 500m
+ memory: 1Gi
+</pre>
+<br />
+<span>**persistent-volumes.yaml** - Storage configuration:</span><br />
+<br />
+<pre>
+apiVersion: v1
+kind: PersistentVolume
+metadata:
+ name: tempo-data-pv
+spec:
+ capacity:
+ storage: 10Gi
+ accessModes:
+ - ReadWriteOnce
+ persistentVolumeReclaimPolicy: Retain
+ hostPath:
+ path: /data/nfs/k3svolumes/tempo/data
+---
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+ name: tempo-data-pvc
+ namespace: monitoring
+spec:
+ storageClassName: ""
+ accessModes:
+ - ReadWriteOnce
+ resources:
+ requests:
+ storage: 10Gi
+</pre>
+<br />
+<span>**Grafana Datasource Provisioning**</span><br />
+<br />
+<span>All Grafana datasources (Prometheus, Alertmanager, Loki, Tempo) are provisioned via a unified ConfigMap that is directly mounted to the Grafana pod. This approach ensures datasources are loaded on startup without requiring sidecar-based discovery.</span><br />
+<br />
+<span>In /home/paul/git/conf/f3s/prometheus/grafana-datasources-all.yaml:</span><br />
+<br />
+<pre>
+apiVersion: v1
+kind: ConfigMap
+metadata:
+ name: grafana-datasources-all
+ namespace: monitoring
+data:
+ datasources.yaml: |
+ apiVersion: 1
+ datasources:
+ - name: Prometheus
+ type: prometheus
+ uid: prometheus
+ url: http://prometheus-kube-prometheus-prometheus.monitoring:9090/
+ access: proxy
+ isDefault: true
+ - name: Alertmanager
+ type: alertmanager
+ uid: alertmanager
+ url: http://prometheus-kube-prometheus-alertmanager.monitoring:9093/
+ - name: Loki
+ type: loki
+ uid: loki
+ url: http://loki.monitoring.svc.cluster.local:3100
+ - name: Tempo
+ type: tempo
+ uid: tempo
+ url: http://tempo.monitoring.svc.cluster.local:3200
+ jsonData:
+ tracesToLogsV2:
+ datasourceUid: loki
+ spanStartTimeShift: -1h
+ spanEndTimeShift: 1h
+ tracesToMetrics:
+ datasourceUid: prometheus
+ serviceMap:
+ datasourceUid: prometheus
+ nodeGraph:
+ enabled: true
+</pre>
+<br />
+<span>The kube-prometheus-stack Helm values (persistence-values.yaml) are configured to:</span><br />
+<ul>
+<li>Disable sidecar-based datasource provisioning</li>
+<li>Mount grafana-datasources-all ConfigMap directly to /etc/grafana/provisioning/datasources/</li>
+</ul><br />
+<span>This direct mounting approach is simpler and more reliable than sidecar-based discovery.</span><br />
+<br />
+<span>#### Installation</span><br />
+<br />
+<pre>
+cd /home/paul/git/conf/f3s/tempo
+just install
+</pre>
+<br />
+<span>Verify Tempo is running:</span><br />
+<br />
+<pre>
+kubectl get pods -n monitoring -l app.kubernetes.io/name=tempo
+kubectl exec -n monitoring &lt;tempo-pod&gt; -- wget -qO- http://localhost:3200/ready
+</pre>
+<br />
+<h3 style='display: inline' id='configuring-grafana-alloy-for-trace-collection'>Configuring Grafana Alloy for Trace Collection</h3><br />
+<br />
+<span>Updated /home/paul/git/conf/f3s/loki/alloy-values.yaml to add OTLP receivers for traces while maintaining existing log collection.</span><br />
+<br />
+<span>#### OTLP Receiver Configuration</span><br />
+<br />
+<span>Added to Alloy configuration after the log collection pipeline:</span><br />
+<br />
+<pre>
+// OTLP receiver for traces via gRPC and HTTP
+otelcol.receiver.otlp "default" {
+ grpc {
+ endpoint = "0.0.0.0:4317"
+ }
+ http {
+ endpoint = "0.0.0.0:4318"
+ }
+ output {
+ traces = [otelcol.processor.batch.default.input]
+ }
+}
+
+// Batch processor for efficient trace forwarding
+otelcol.processor.batch "default" {
+ timeout = "5s"
+ send_batch_size = 100
+ send_batch_max_size = 200
+ output {
+ traces = [otelcol.exporter.otlp.tempo.input]
+ }
+}
+
+// OTLP exporter to send traces to Tempo
+otelcol.exporter.otlp "tempo" {
+ client {
+ endpoint = "tempo.monitoring.svc.cluster.local:4317"
+ tls {
+ insecure = true
+ }
+ compression = "gzip"
+ }
+}
+</pre>
+<br />
+<span>The batch processor reduces network overhead by accumulating spans before forwarding to Tempo.</span><br />
+<br />
+<span>#### Upgrade Alloy</span><br />
+<br />
+<pre>
+cd /home/paul/git/conf/f3s/loki
+just upgrade
+</pre>
+<br />
+<span>Verify OTLP receivers are listening:</span><br />
+<br />
+<pre>
+kubectl logs -n monitoring -l app.kubernetes.io/name=alloy | grep -i "otlp.*receiver"
+kubectl exec -n monitoring &lt;alloy-pod&gt; -- netstat -ln | grep -E &#39;:(4317|4318)&#39;
+</pre>
+<br />
+<h3 style='display: inline' id='demo-tracing-application'>Demo Tracing Application</h3><br />
+<br />
+<span>Created a three-tier Python application to demonstrate distributed tracing in action.</span><br />
+<br />
+<span>#### Application Architecture</span><br />
+<br />
+<pre>
+User → Frontend (Flask:5000) → Middleware (Flask:5001) → Backend (Flask:5002)
+ ↓ ↓ ↓
+ Alloy (OTLP:4317) → Tempo → Grafana
+</pre>
+<br />
+<span>Frontend Service:</span><br />
+<br />
+<ul>
+<li>Receives HTTP requests at /api/process</li>
+<li>Forwards to middleware service</li>
+<li>Creates parent span for the entire request</li>
+</ul><br />
+<span>Middleware Service:</span><br />
+<br />
+<ul>
+<li>Transforms data at /api/transform</li>
+<li>Calls backend service</li>
+<li>Creates child span linked to frontend</li>
+</ul><br />
+<span>Backend Service:</span><br />
+<br />
+<ul>
+<li>Returns data at /api/data</li>
+<li>Simulates database query (100ms sleep)</li>
+<li>Creates leaf span in the trace</li>
+</ul><br />
+<span>OpenTelemetry Instrumentation:</span><br />
+<br />
+<span>All services use Python OpenTelemetry libraries:</span><br />
+<br />
+<span>**Dependencies:**</span><br />
+<pre>
+flask==3.0.0
+requests==2.31.0
+opentelemetry-distro==0.49b0
+opentelemetry-exporter-otlp==1.28.0
+opentelemetry-instrumentation-flask==0.49b0
+opentelemetry-instrumentation-requests==0.49b0
+</pre>
+<br />
+<span>**Auto-instrumentation pattern** (used in all services):</span><br />
+<br />
+<!-- Generator: GNU source-highlight 3.1.9
+by Lorenzo Bettini
+http://www.lorenzobettini.it
+http://www.gnu.org/software/src-highlite -->
+<pre><font color="#ababab">from</font><font color="#ff0000"> opentelemetry </font><font color="#ababab">import</font><font color="#ff0000"> trace</font>
+<font color="#ababab">from</font><font color="#ff0000"> opentelemetry</font><font color="#F3E651">.</font><font color="#ff0000">sdk</font><font color="#F3E651">.</font><font color="#ff0000">trace </font><font color="#ababab">import</font><font color="#ff0000"> TracerProvider</font>
+<font color="#ababab">from</font><font color="#ff0000"> opentelemetry</font><font color="#F3E651">.</font><font color="#ff0000">exporter</font><font color="#F3E651">.</font><font color="#ff0000">otlp</font><font color="#F3E651">.</font><font color="#ff0000">proto</font><font color="#F3E651">.</font><font color="#ff0000">grpc</font><font color="#F3E651">.</font><font color="#ff0000">trace_exporter </font><font color="#ababab">import</font><font color="#ff0000"> OTLPSpanExporter</font>
+<font color="#ababab">from</font><font color="#ff0000"> opentelemetry</font><font color="#F3E651">.</font><font color="#ff0000">instrumentation</font><font color="#F3E651">.</font><font color="#ff0000">flask </font><font color="#ababab">import</font><font color="#ff0000"> FlaskInstrumentor</font>
+<font color="#ababab">from</font><font color="#ff0000"> opentelemetry</font><font color="#F3E651">.</font><font color="#ff0000">instrumentation</font><font color="#F3E651">.</font><font color="#ff0000">requests </font><font color="#ababab">import</font><font color="#ff0000"> RequestsInstrumentor</font>
+<font color="#ababab">from</font><font color="#ff0000"> opentelemetry</font><font color="#F3E651">.</font><font color="#ff0000">sdk</font><font color="#F3E651">.</font><font color="#ff0000">resources </font><font color="#ababab">import</font><font color="#ff0000"> Resource</font>
+
+<i><font color="#ababab"># Define service identity</font></i>
+<font color="#ff0000">resource </font><font color="#F3E651">=</font><font color="#ff0000"> </font><font color="#7bc710">Resource</font><font color="#F3E651">(</font><font color="#ff0000">attributes</font><font color="#F3E651">={</font>
+<font color="#ff0000"> </font><font color="#bb00ff">"service.name"</font><font color="#F3E651">:</font><font color="#ff0000"> </font><font color="#bb00ff">"frontend"</font><font color="#F3E651">,</font>
+<font color="#ff0000"> </font><font color="#bb00ff">"service.namespace"</font><font color="#F3E651">:</font><font color="#ff0000"> </font><font color="#bb00ff">"tracing-demo"</font><font color="#F3E651">,</font>
+<font color="#ff0000"> </font><font color="#bb00ff">"service.version"</font><font color="#F3E651">:</font><font color="#ff0000"> </font><font color="#bb00ff">"1.0.0"</font>
+<font color="#F3E651">})</font>
+
+<font color="#ff0000">provider </font><font color="#F3E651">=</font><font color="#ff0000"> </font><font color="#7bc710">TracerProvider</font><font color="#F3E651">(</font><font color="#ff0000">resource</font><font color="#F3E651">=</font><font color="#ff0000">resource</font><font color="#F3E651">)</font>
+
+<i><font color="#ababab"># Export to Alloy</font></i>
+<font color="#ff0000">otlp_exporter </font><font color="#F3E651">=</font><font color="#ff0000"> </font><font color="#7bc710">OTLPSpanExporter</font><font color="#F3E651">(</font>
+<font color="#ff0000"> endpoint</font><font color="#F3E651">=</font><font color="#bb00ff">"http://alloy.monitoring.svc.cluster.local:4317"</font><font color="#F3E651">,</font>
+<font color="#ff0000"> insecure</font><font color="#F3E651">=</font><font color="#ff0000">True</font>
+<font color="#F3E651">)</font>
+
+<font color="#ff0000">processor </font><font color="#F3E651">=</font><font color="#ff0000"> </font><font color="#7bc710">BatchSpanProcessor</font><font color="#F3E651">(</font><font color="#ff0000">otlp_exporter</font><font color="#F3E651">)</font>
+<font color="#ff0000">provider</font><font color="#F3E651">.</font><font color="#7bc710">add_span_processor</font><font color="#F3E651">(</font><font color="#ff0000">processor</font><font color="#F3E651">)</font>
+<font color="#ff0000">trace</font><font color="#F3E651">.</font><font color="#7bc710">set_tracer_provider</font><font color="#F3E651">(</font><font color="#ff0000">provider</font><font color="#F3E651">)</font>
+
+<i><font color="#ababab"># Auto-instrument Flask and requests</font></i>
+<font color="#7bc710">FlaskInstrumentor</font><font color="#F3E651">().</font><font color="#7bc710">instrument_app</font><font color="#F3E651">(</font><font color="#ff0000">app</font><font color="#F3E651">)</font>
+<font color="#7bc710">RequestsInstrumentor</font><font color="#F3E651">().</font><font color="#7bc710">instrument</font><font color="#F3E651">()</font>
+</pre>
+<br />
+<span>The auto-instrumentation automatically:</span><br />
+<ul>
+<li>Creates spans for HTTP requests</li>
+<li>Propagates trace context via W3C Trace Context headers</li>
+<li>Links parent and child spans across service boundaries</li>
+</ul><br />
+<span>Deployment:</span><br />
+<br />
+<span>Created Helm chart in /home/paul/git/conf/f3s/tracing-demo/ with three separate deployments, services, and an ingress.</span><br />
+<br />
+<span>Build and deploy:</span><br />
+<br />
+<pre>
+cd /home/paul/git/conf/f3s/tracing-demo
+just build
+just import
+just install
+</pre>
+<br />
+<span>Verify deployment:</span><br />
+<br />
+<pre>
+kubectl get pods -n services | grep tracing-demo
+kubectl get ingress -n services tracing-demo-ingress
+</pre>
+<br />
+<span>Access the application at:</span><br />
+<br />
+<a class='textlink' href='http://tracing-demo.f3s.buetow.org'>http://tracing-demo.f3s.buetow.org</a><br />
+<br />
+<h3 style='display: inline' id='visualizing-traces-in-grafana'>Visualizing Traces in Grafana</h3><br />
+<br />
+<span>The Tempo datasource is automatically discovered by Grafana through the ConfigMap label.</span><br />
+<br />
+<span>#### Accessing Traces</span><br />
+<br />
+<span>Navigate to Grafana → Explore → Select "Tempo" datasource</span><br />
+<br />
+<span>**Search Interface:**</span><br />
+<ul>
+<li>Search by Trace ID</li>
+<li>Search by service name</li>
+<li>Search by tags</li>
+</ul><br />
+<span>**TraceQL Queries:**</span><br />
+<br />
+<span>Find all traces from demo app:</span><br />
+<pre>
+{ resource.service.namespace = "tracing-demo" }
+</pre>
+<br />
+<span>Find slow requests (&gt;200ms):</span><br />
+<pre>
+{ duration &gt; 200ms }
+</pre>
+<br />
+<span>Find traces from specific service:</span><br />
+<pre>
+{ resource.service.name = "frontend" }
+</pre>
+<br />
+<span>Find errors:</span><br />
+<pre>
+{ status = error }
+</pre>
+<br />
+<span>Complex query - frontend traces calling middleware:</span><br />
+<pre>
+{ resource.service.namespace = "tracing-demo" } &amp;&amp; { span.http.status_code &gt;= 500 }
+</pre>
+<br />
+<span>#### Service Graph Visualization</span><br />
+<br />
+<span>The service graph shows visual connections between services:</span><br />
+<br />
+<span>1. Navigate to Explore → Tempo</span><br />
+<span>2. Enable "Service Graph" view</span><br />
+<span>3. Shows: Frontend → Middleware → Backend with request rates</span><br />
+<br />
+<span>The service graph uses Prometheus metrics generated from trace data.</span><br />
+<br />
+<h3 style='display: inline' id='correlation-between-observability-signals'>Correlation Between Observability Signals</h3><br />
+<br />
+<span>Tempo integrates with Loki and Prometheus to provide unified observability.</span><br />
+<br />
+<span>#### Traces-to-Logs</span><br />
+<br />
+<span>Click on any span in a trace to see related logs:</span><br />
+<br />
+<span>1. View trace in Grafana</span><br />
+<span>2. Click on a span</span><br />
+<span>3. Select "Logs for this span"</span><br />
+<span>4. Loki shows logs filtered by:</span><br />
+<span> * Time range (span duration ± 1 hour)</span><br />
+<span> * Service name</span><br />
+<span> * Namespace</span><br />
+<span> * Pod</span><br />
+<br />
+<span>This helps correlate what the service was doing when the span was created.</span><br />
+<br />
+<span>#### Traces-to-Metrics</span><br />
+<br />
+<span>View Prometheus metrics for services in the trace:</span><br />
+<br />
+<span>1. View trace in Grafana</span><br />
+<span>2. Select "Metrics" tab</span><br />
+<span>3. Shows metrics like:</span><br />
+<span> * Request rate</span><br />
+<span> * Error rate</span><br />
+<span> * Duration percentiles</span><br />
+<br />
+<span>#### Logs-to-Traces</span><br />
+<br />
+<span>From logs, you can jump to related traces:</span><br />
+<br />
+<span>1. In Loki, logs that contain trace IDs are automatically linked</span><br />
+<span>2. Click the trace ID to view the full trace</span><br />
+<span>3. See the complete request flow</span><br />
+<br />
+<h3 style='display: inline' id='generating-traces-for-testing'>Generating Traces for Testing</h3><br />
+<br />
+<span>Test the demo application:</span><br />
+<br />
+<pre>
+curl http://tracing-demo.f3s.buetow.org/api/process
+</pre>
+<br />
+<span>Load test (generates 50 traces):</span><br />
+<br />
+<pre>
+cd /home/paul/git/conf/f3s/tracing-demo
+just load-test
+</pre>
+<br />
+<span>Each request creates a distributed trace spanning all three services.</span><br />
+<br />
+<h3 style='display: inline' id='verifying-the-complete-pipeline'>Verifying the Complete Pipeline</h3><br />
+<br />
+<span>Check the trace flow end-to-end:</span><br />
+<br />
+<span>**1. Application generates traces:**</span><br />
+<pre>
+kubectl logs -n services -l app=tracing-demo-frontend | grep -i trace
+</pre>
+<br />
+<span>**2. Alloy receives traces:**</span><br />
+<pre>
+kubectl logs -n monitoring -l app.kubernetes.io/name=alloy | grep -i otlp
+</pre>
+<br />
+<span>**3. Tempo stores traces:**</span><br />
+<pre>
+kubectl logs -n monitoring -l app.kubernetes.io/name=tempo | grep -i trace
+</pre>
+<br />
+<span>**4. Grafana displays traces:**</span><br />
+<span>Navigate to Explore → Tempo → Search for traces</span><br />
+<br />
+<h3 style='display: inline' id='practical-example-viewing-a-distributed-trace'>Practical Example: Viewing a Distributed Trace</h3><br />
+<br />
+<span>Let&#39;s generate a trace and examine it in Grafana.</span><br />
+<br />
+<span>**1. Generate a trace by calling the demo application:**</span><br />
+<br />
+<pre>
+curl -H "Host: tracing-demo.f3s.buetow.org" http://r0/api/process
+</pre>
+<br />
+<span>**Response (HTTP 200):**</span><br />
+<br />
+<!-- Generator: GNU source-highlight 3.1.9
+by Lorenzo Bettini
+http://www.lorenzobettini.it
+http://www.gnu.org/software/src-highlite -->
+<pre><font color="#F3E651">{</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">middleware_response</font><font color="#ff0000">"</font><font color="#ff0000">: </font><font color="#F3E651">{</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">backend_data</font><font color="#ff0000">"</font><font color="#ff0000">: </font><font color="#F3E651">{</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">data</font><font color="#ff0000">"</font><font color="#ff0000">: </font><font color="#F3E651">{</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">id</font><font color="#ff0000">"</font><font color="#ff0000">: </font><font color="#bb00ff">12345</font><font color="#F3E651">,</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">query_time_ms</font><font color="#ff0000">"</font><font color="#ff0000">: </font><font color="#bb00ff">100.0</font><font color="#F3E651">,</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">timestamp</font><font color="#ff0000">"</font><font color="#ff0000">:</font><font color="#ff0000"> "</font><font color="#bb00ff">2025-12-28T18:35:01.064538</font><font color="#ff0000">"</font><font color="#F3E651">,</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">value</font><font color="#ff0000">"</font><font color="#ff0000">:</font><font color="#ff0000"> "</font><font color="#bb00ff">Sample data from backend service</font><font color="#ff0000">"</font>
+<font color="#ff0000"> </font><font color="#F3E651">},</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">service</font><font color="#ff0000">"</font><font color="#ff0000">:</font><font color="#ff0000"> "</font><font color="#bb00ff">backend</font><font color="#ff0000">"</font>
+<font color="#ff0000"> </font><font color="#F3E651">},</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">middleware_processed</font><font color="#ff0000">"</font><font color="#ff0000">: </font><b><font color="#ffffff">true</font></b><font color="#F3E651">,</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">original_data</font><font color="#ff0000">"</font><font color="#ff0000">: </font><font color="#F3E651">{</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">source</font><font color="#ff0000">"</font><font color="#ff0000">:</font><font color="#ff0000"> "</font><font color="#bb00ff">GET request</font><font color="#ff0000">"</font>
+<font color="#ff0000"> </font><font color="#F3E651">},</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">transformation_time_ms</font><font color="#ff0000">"</font><font color="#ff0000">: </font><font color="#bb00ff">50</font>
+<font color="#ff0000"> </font><font color="#F3E651">},</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">request_data</font><font color="#ff0000">"</font><font color="#ff0000">: </font><font color="#F3E651">{</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">source</font><font color="#ff0000">"</font><font color="#ff0000">:</font><font color="#ff0000"> "</font><font color="#bb00ff">GET request</font><font color="#ff0000">"</font>
+<font color="#ff0000"> </font><font color="#F3E651">},</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">service</font><font color="#ff0000">"</font><font color="#ff0000">:</font><font color="#ff0000"> "</font><font color="#bb00ff">frontend</font><font color="#ff0000">"</font><font color="#F3E651">,</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">status</font><font color="#ff0000">"</font><font color="#ff0000">:</font><font color="#ff0000"> "</font><font color="#bb00ff">success</font><font color="#ff0000">"</font>
+<font color="#F3E651">}</font>
+</pre>
+<br />
+<span>**2. Find the trace in Tempo via API:**</span><br />
+<br />
+<span>After a few seconds (for batch export), search for recent traces:</span><br />
+<br />
+<pre>
+kubectl exec -n monitoring tempo-0 -- wget -qO- \
+ &#39;http://localhost:3200/api/search?tags=service.namespace%3Dtracing-demo&amp;limit=5&#39; 2&gt;/dev/null | \
+ python3 -m json.tool
+</pre>
+<br />
+<span>Returns traces including:</span><br />
+<br />
+<!-- Generator: GNU source-highlight 3.1.9
+by Lorenzo Bettini
+http://www.lorenzobettini.it
+http://www.gnu.org/software/src-highlite -->
+<pre><font color="#F3E651">{</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">traceID</font><font color="#ff0000">"</font><font color="#ff0000">:</font><font color="#ff0000"> "</font><font color="#bb00ff">4be1151c0bdcd5625ac7e02b98d95bd5</font><font color="#ff0000">"</font><font color="#F3E651">,</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">rootServiceName</font><font color="#ff0000">"</font><font color="#ff0000">:</font><font color="#ff0000"> "</font><font color="#bb00ff">frontend</font><font color="#ff0000">"</font><font color="#F3E651">,</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">rootTraceName</font><font color="#ff0000">"</font><font color="#ff0000">:</font><font color="#ff0000"> "</font><font color="#bb00ff">GET /api/process</font><font color="#ff0000">"</font><font color="#F3E651">,</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">durationMs</font><font color="#ff0000">"</font><font color="#ff0000">: </font><font color="#bb00ff">221</font>
+<font color="#F3E651">}</font>
+</pre>
+<br />
+<span>**3. Fetch complete trace details:**</span><br />
+<br />
+<pre>
+kubectl exec -n monitoring tempo-0 -- wget -qO- \
+ &#39;http://localhost:3200/api/traces/4be1151c0bdcd5625ac7e02b98d95bd5&#39; 2&gt;/dev/null | \
+ python3 -m json.tool
+</pre>
+<br />
+<span>**Trace structure (8 spans across 3 services):**</span><br />
+<br />
+<pre>
+Trace ID: 4be1151c0bdcd5625ac7e02b98d95bd5
+Services: 3 (frontend, middleware, backend)
+
+Service: frontend
+ └─ GET /api/process 221.10ms (HTTP server span)
+ └─ frontend-process 216.23ms (custom business logic span)
+ └─ POST 209.97ms (HTTP client span to middleware)
+
+Service: middleware
+ └─ POST /api/transform 186.02ms (HTTP server span)
+ └─ middleware-transform 180.96ms (custom business logic span)
+ └─ GET 127.52ms (HTTP client span to backend)
+
+Service: backend
+ └─ GET /api/data 103.93ms (HTTP server span)
+ └─ backend-get-data 102.11ms (custom business logic span with 100ms sleep)
+</pre>
+<br />
+<span>**4. View the trace in Grafana UI:**</span><br />
+<br />
+<span>Navigate to: Grafana → Explore → Tempo datasource</span><br />
+<br />
+<span>Search using TraceQL:</span><br />
+<pre>
+{ resource.service.namespace = "tracing-demo" }
+</pre>
+<br />
+<span>Or directly open the trace by pasting the trace ID in the search box:</span><br />
+<pre>
+4be1151c0bdcd5625ac7e02b98d95bd5
+</pre>
+<br />
+<span>**5. Trace visualization:**</span><br />
+<br />
+<span>The trace waterfall view in Grafana shows the complete request flow with timing:</span><br />
+<br />
+<a href='./f3s-kubernetes-with-freebsd-part-8/grafana-tempo-trace.png'><img alt='Distributed trace visualization in Grafana Tempo showing Frontend → Middleware → Backend spans' title='Distributed trace visualization in Grafana Tempo showing Frontend → Middleware → Backend spans' src='./f3s-kubernetes-with-freebsd-part-8/grafana-tempo-trace.png' /></a><br />
+<br />
+<span>For additional examples of Tempo trace visualization, see also:</span><br />
+<br />
+<a class='textlink' href='https://foo.zone/gemfeed/2025-12-24-x-rag-observability-hackathon.html'>X-RAG Observability Hackathon (more Grafana Tempo screenshots)</a><br />
+<br />
+<span>The trace reveals the distributed request flow:</span><br />
+<br />
+<ul>
+<li>Frontend (221ms): Receives GET /api/process, executes business logic, calls middleware</li>
+<li>Middleware (186ms): Receives POST /api/transform, transforms data, calls backend</li>
+<li>Backend (104ms): Receives GET /api/data, simulates database query with 100ms sleep</li>
+<li>Total request time: 221ms end-to-end</li>
+<li>Span propagation: W3C Trace Context headers automatically link all spans</li>
+</ul><br />
+<span>**6. Service graph visualization:**</span><br />
+<br />
+<span>The service graph is automatically generated from traces and shows service dependencies. For examples of service graph visualization in Grafana, see the screenshots in the X-RAG Observability Hackathon blog post.</span><br />
+<br />
+<a class='textlink' href='./2025-12-24-x-rag-observability-hackathon.html'>X-RAG Observability Hackathon (includes service graph screenshots)</a><br />
+<br />
+<span>This visualization helps identify:</span><br />
+<br />
+<ul>
+<li>Request rates between services</li>
+<li>Average latency for each hop</li>
+<li>Error rates (if any)</li>
+<li>Service dependencies and communication patterns</li>
+</ul><br />
+<h3 style='display: inline' id='storage-and-retention'>Storage and Retention</h3><br />
+<br />
+<span>Monitor Tempo storage usage:</span><br />
+<br />
+<pre>
+kubectl exec -n monitoring &lt;tempo-pod&gt; -- df -h /var/tempo
+</pre>
+<br />
+<span>With 10Gi storage and 7-day retention, the system handles moderate trace volumes. If storage fills up:</span><br />
+<br />
+<ul>
+<li>Reduce retention to 72h (3 days)</li>
+<li>Implement sampling in Alloy</li>
+<li>Increase PV size</li>
+</ul><br />
+<h3 style='display: inline' id='configuration-files'>Configuration Files</h3><br />
+<br />
+<span>All configuration files are available on Codeberg:</span><br />
+<br />
+<a class='textlink' href='https://codeberg.org/snonux/conf/src/branch/master/f3s/tempo'>Tempo configuration</a><br />
+<a class='textlink' href='https://codeberg.org/snonux/conf/src/branch/master/f3s/loki'>Alloy configuration (updated for traces)</a><br />
+<a class='textlink' href='https://codeberg.org/snonux/conf/src/branch/master/f3s/tracing-demo'>Demo tracing application</a><br />
+<br />
<h2 style='display: inline' id='summary'>Summary</h2><br />
<br />
-<span>With Prometheus, Grafana, Loki, and Alloy deployed, I now have complete visibility into the k3s cluster, the FreeBSD storage servers, and the OpenBSD edge relays:</span><br />
+<span>With Prometheus, Grafana, Loki, Alloy, and Tempo deployed, I now have complete visibility into the k3s cluster, the FreeBSD storage servers, and the OpenBSD edge relays:</span><br />
<br />
<ul>
-<li>metrics: Prometheus collects and stores time-series data from all components</li>
+<li>Metrics: Prometheus collects and stores time-series data from all components, including etcd and ZFS</li>
<li>Logs: Loki aggregates logs from all containers, searchable via Grafana</li>
-<li>Visualisation: Grafana provides dashboards and exploration tools</li>
+<li>Traces: Tempo provides distributed request tracing with service dependency mapping</li>
+<li>Visualisation: Grafana provides dashboards and exploration tools with correlation between all three signals</li>
<li>Alerting: Alertmanager can notify on conditions defined in Prometheus rules</li>
</ul><br />
<span>This observability stack runs entirely on the home lab infrastructure, with data persisted to the NFS share. It&#39;s lightweight enough for a three-node cluster but provides the same capabilities as production-grade setups.</span><br />
<br />
+<span>All configuration files are available on Codeberg:</span><br />
+<br />
+<a class='textlink' href='https://codeberg.org/snonux/conf/src/branch/master/f3s/prometheus'>Prometheus, Grafana, and recording rules configuration</a><br />
+<a class='textlink' href='https://codeberg.org/snonux/conf/src/branch/master/f3s/loki'>Loki and Alloy configuration</a><br />
+<a class='textlink' href='https://codeberg.org/snonux/conf/src/branch/master/f3s/tempo'>Tempo configuration</a><br />
+<a class='textlink' href='https://codeberg.org/snonux/conf/src/branch/master/f3s/tracing-demo'>Demo tracing application</a><br />
+<br />
<span>Other *BSD-related posts:</span><br />
<br />
<a class='textlink' href='./2025-12-07-f3s-kubernetes-with-freebsd-part-8.html'>2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability (You are currently reading this)</a><br />
diff --git a/gemfeed/DRAFT-f3s-kubernetes-with-freebsd-part-8b.html b/gemfeed/DRAFT-f3s-kubernetes-with-freebsd-part-8b.html
deleted file mode 100644
index 450ae8d4..00000000
--- a/gemfeed/DRAFT-f3s-kubernetes-with-freebsd-part-8b.html
+++ /dev/null
@@ -1,1164 +0,0 @@
-<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
-<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
-<head>
-<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
-<meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<title>f3s: Kubernetes with FreeBSD - Part 9: Enabling etcd Metrics</title>
-<link rel="shortcut icon" type="image/gif" href="/favicon.ico" />
-<link rel="stylesheet" href="../style.css" />
-<link rel="stylesheet" href="style-override.css" />
-</head>
-<body>
-<div class="rfx-overlay-grid"></div>
-<div class="rfx-overlay-scanlines"></div>
-<div id="rfx-stars"></div>
-<div class="rfx-vignette"></div>
-<p class="header">
-<a href="https://foo.zone">Home</a> | <a href="https://codeberg.org/snonux/foo.zone/src/branch/content-md/gemfeed/DRAFT-f3s-kubernetes-with-freebsd-part-8b.md">Markdown</a> | <a href="gemini://foo.zone/gemfeed/DRAFT-f3s-kubernetes-with-freebsd-part-8b.gmi">Gemini</a>
-</p>
-<h1 style='display: inline' id='f3s-kubernetes-with-freebsd---part-9-enabling-etcd-metrics'>f3s: Kubernetes with FreeBSD - Part 9: Enabling etcd Metrics</h1><br />
-<br />
-<h2 style='display: inline' id='introduction'>Introduction</h2><br />
-<br />
-<span>This post covers enabling etcd metrics monitoring for the k3s cluster. The etcd dashboard in Grafana initially showed no data because k3s uses an embedded etcd that doesn&#39;t expose metrics by default.</span><br />
-<br />
-<a class='textlink' href='./2025-12-07-f3s-kubernetes-with-freebsd-part-8.html'>Part 8: Observability</a><br />
-<br />
-<h2 style='display: inline' id='important-note-gitops-migration'>Important Note: GitOps Migration</h2><br />
-<br />
-<span>**Note:** After the initial observability setup, the f3s cluster was migrated from imperative Helm deployments to declarative GitOps using ArgoCD. The Prometheus configuration and deployment process described in this post have been updated for ArgoCD.</span><br />
-<br />
-<span>**To view the configuration as it existed before the ArgoCD migration**, check out the pre-ArgoCD revision:</span><br />
-<br />
-<!-- Generator: GNU source-highlight 3.1.9
-by Lorenzo Bettini
-http://www.lorenzobettini.it
-http://www.gnu.org/software/src-highlite -->
-<pre><font color="#ff0000">$ git clone https</font><font color="#F3E651">:</font><font color="#ff0000">//codeberg</font><font color="#F3E651">.</font><font color="#ff0000">org/snonux/conf</font><font color="#F3E651">.</font><font color="#ff0000">git</font>
-<font color="#ff0000">$ cd conf</font>
-<font color="#ff0000">$ git checkout 15a86f3 </font><i><font color="#ababab"># Last commit before ArgoCD migration</font></i>
-<font color="#ff0000">$ cd f3s/prometheus</font><font color="#F3E651">/</font>
-</pre>
-<br />
-<span>**Current master branch** uses ArgoCD with:</span><br />
-<ul>
-<li>Application manifest: <span class='inlinecode'>argocd-apps/monitoring/prometheus.yaml</span></li>
-<li>Multi-source Application combining upstream chart + custom manifests</li>
-<li>Justfile commands updated to trigger ArgoCD syncs instead of direct Helm commands</li>
-</ul><br />
-<span>The etcd configuration concepts remain the same—only the deployment method changed. Instead of running <span class='inlinecode'>just upgrade</span>, you would:</span><br />
-<span>1. Update the configuration in Git</span><br />
-<span>2. Commit and push</span><br />
-<span>3. ArgoCD automatically syncs (or run <span class='inlinecode'>just sync</span> for immediate sync)</span><br />
-<br />
-<h2 style='display: inline' id='enabling-etcd-metrics-in-k3s'>Enabling etcd metrics in k3s</h2><br />
-<br />
-<span>On each control-plane node (r0, r1, r2), create /etc/rancher/k3s/config.yaml:</span><br />
-<br />
-<pre>
-etcd-expose-metrics: true
-</pre>
-<br />
-<span>Then restart k3s on each node:</span><br />
-<br />
-<pre>
-systemctl restart k3s
-</pre>
-<br />
-<span>After restarting, etcd metrics are available on port 2381:</span><br />
-<br />
-<pre>
-curl http://127.0.0.1:2381/metrics | grep etcd
-</pre>
-<br />
-<h2 style='display: inline' id='configuring-prometheus-to-scrape-etcd'>Configuring Prometheus to scrape etcd</h2><br />
-<br />
-<span>In persistence-values.yaml, enable kubeEtcd with the node IP addresses:</span><br />
-<br />
-<pre>
-kubeEtcd:
- enabled: true
- endpoints:
- - 192.168.1.120
- - 192.168.1.121
- - 192.168.1.122
- service:
- enabled: true
- port: 2381
- targetPort: 2381
-</pre>
-<br />
-<span>Apply the changes:</span><br />
-<br />
-<pre>
-just upgrade
-</pre>
-<br />
-<h2 style='display: inline' id='verifying-etcd-metrics'>Verifying etcd metrics</h2><br />
-<br />
-<span>After the changes, all etcd targets are being scraped:</span><br />
-<br />
-<pre>
-kubectl exec -n monitoring prometheus-prometheus-kube-prometheus-prometheus-0 \
- -c prometheus -- wget -qO- &#39;http://localhost:9090/api/v1/query?query=etcd_server_has_leader&#39; | \
- jq -r &#39;.data.result[] | "\(.metric.instance): \(.value[1])"&#39;
-</pre>
-<br />
-<span>Output:</span><br />
-<br />
-<pre>
-192.168.1.120:2381: 1
-192.168.1.121:2381: 1
-192.168.1.122:2381: 1
-</pre>
-<br />
-<span>The etcd dashboard in Grafana now displays metrics including Raft proposals, leader elections, and peer round trip times.</span><br />
-<br />
-<h2 style='display: inline' id='complete-persistence-valuesyaml'>Complete persistence-values.yaml</h2><br />
-<br />
-<span>The complete updated persistence-values.yaml:</span><br />
-<br />
-<pre>
-kubeEtcd:
- enabled: true
- endpoints:
- - 192.168.1.120
- - 192.168.1.121
- - 192.168.1.122
- service:
- enabled: true
- port: 2381
- targetPort: 2381
-
-prometheus:
- prometheusSpec:
- additionalScrapeConfigsSecret:
- enabled: true
- name: additional-scrape-configs
- key: additional-scrape-configs.yaml
- storageSpec:
- volumeClaimTemplate:
- spec:
- storageClassName: ""
- accessModes: ["ReadWriteOnce"]
- resources:
- requests:
- storage: 10Gi
- selector:
- matchLabels:
- type: local
- app: prometheus
-
-grafana:
- persistence:
- enabled: true
- type: pvc
- existingClaim: "grafana-data-pvc"
-
- initChownData:
- enabled: false
-
- podSecurityContext:
- fsGroup: 911
- runAsUser: 911
- runAsGroup: 911
-</pre>
-<br />
-<h2 style='display: inline' id='zfs-monitoring-for-freebsd-servers'>ZFS Monitoring for FreeBSD Servers</h2><br />
-<br />
-<span>The FreeBSD servers (f0, f1, f2) that provide NFS storage to the k3s cluster have ZFS filesystems. Monitoring ZFS performance is crucial for understanding storage performance and cache efficiency.</span><br />
-<br />
-<h3 style='display: inline' id='node-exporter-zfs-collector'>Node Exporter ZFS Collector</h3><br />
-<br />
-<span>The node_exporter running on each FreeBSD server (v1.9.1) includes a built-in ZFS collector that exposes metrics via sysctls. The ZFS collector is enabled by default and provides:</span><br />
-<br />
-<ul>
-<li>ARC (Adaptive Replacement Cache) statistics</li>
-<li>Cache hit/miss rates</li>
-<li>Memory usage and allocation</li>
-<li>MRU/MFU cache breakdown</li>
-<li>Data vs metadata distribution</li>
-</ul><br />
-<h3 style='display: inline' id='verifying-zfs-metrics'>Verifying ZFS Metrics</h3><br />
-<br />
-<span>On any FreeBSD server, check that ZFS metrics are being exposed:</span><br />
-<br />
-<pre>
-paul@f0:~ % curl -s http://localhost:9100/metrics | grep node_zfs_arcstats | wc -l
- 69
-</pre>
-<br />
-<span>The metrics are automatically scraped by Prometheus through the existing static configuration in additional-scrape-configs.yaml which targets all FreeBSD servers on port 9100 with the os: freebsd label.</span><br />
-<br />
-<h3 style='display: inline' id='zfs-recording-rules'>ZFS Recording Rules</h3><br />
-<br />
-<span>Created recording rules for easier dashboard consumption in zfs-recording-rules.yaml:</span><br />
-<br />
-<pre>
-apiVersion: monitoring.coreos.com/v1
-kind: PrometheusRule
-metadata:
- name: freebsd-zfs-rules
- namespace: monitoring
- labels:
- release: prometheus
-spec:
- groups:
- - name: freebsd-zfs-arc
- interval: 30s
- rules:
- - record: node_zfs_arc_hit_rate_percent
- expr: |
- 100 * (
- rate(node_zfs_arcstats_hits_total{os="freebsd"}[5m]) /
- (rate(node_zfs_arcstats_hits_total{os="freebsd"}[5m]) +
- rate(node_zfs_arcstats_misses_total{os="freebsd"}[5m]))
- )
- labels:
- os: freebsd
- - record: node_zfs_arc_memory_usage_percent
- expr: |
- 100 * (
- node_zfs_arcstats_size_bytes{os="freebsd"} /
- node_zfs_arcstats_c_max_bytes{os="freebsd"}
- )
- labels:
- os: freebsd
- # Additional rules for metadata %, target %, MRU/MFU %, etc.
-</pre>
-<br />
-<span>These recording rules calculate:</span><br />
-<br />
-<ul>
-<li>ARC hit rate percentage</li>
-<li>ARC memory usage percentage (current vs maximum)</li>
-<li>ARC target percentage (target vs maximum)</li>
-<li>Metadata vs data percentages</li>
-<li>MRU vs MFU cache percentages</li>
-<li>Demand data and metadata hit rates</li>
-</ul><br />
-<h3 style='display: inline' id='grafana-dashboards'>Grafana Dashboards</h3><br />
-<br />
-<span>Created two comprehensive ZFS monitoring dashboards (zfs-dashboards.yaml):</span><br />
-<br />
-<span>**Dashboard 1: FreeBSD ZFS (per-host detailed view)**</span><br />
-<br />
-<span>Includes variables to select:</span><br />
-<ul>
-<li>FreeBSD server (f0, f1, or f2)</li>
-<li>ZFS pool (zdata, zroot, or all)</li>
-</ul><br />
-<span>**Pool Overview Row:**</span><br />
-<ul>
-<li>Pool Capacity gauge (with thresholds: green &lt;70%, yellow &lt;85%, red &gt;85%)</li>
-<li>Pool Health status (ONLINE/DEGRADED/FAULTED with color coding)</li>
-<li>Total Pool Size stat</li>
-<li>Free Space stat</li>
-<li>Pool Space Usage Over Time (stacked: used + free)</li>
-<li>Pool Capacity Trend time series</li>
-</ul><br />
-<span>**Dataset Statistics Row:**</span><br />
-<ul>
-<li>Table showing all datasets with columns: Pool, Dataset, Used, Available, Referenced</li>
-<li>Automatically filters by selected pool</li>
-</ul><br />
-<span>**ARC Cache Statistics Row:**</span><br />
-<ul>
-<li>ARC Hit Rate gauge (red &lt;70%, yellow &lt;90%, green &gt;=90%)</li>
-<li>ARC Size time series (current, target, max)</li>
-<li>ARC Memory Usage percentage gauge</li>
-<li>ARC Hits vs Misses rate</li>
-<li>ARC Data vs Metadata stacked time series</li>
-</ul><br />
-<span>**Dashboard 2: FreeBSD ZFS Summary (cluster-wide overview)**</span><br />
-<br />
-<span>**Cluster-Wide Pool Statistics Row:**</span><br />
-<ul>
-<li>Total Storage Capacity across all servers</li>
-<li>Total Used space</li>
-<li>Total Free space</li>
-<li>Average Pool Capacity gauge</li>
-<li>Pool Health Status (worst case across cluster)</li>
-<li>Total Pool Space Usage Over Time</li>
-<li>Per-Pool Capacity time series (all pools on all hosts)</li>
-</ul><br />
-<span>**Per-Host Pool Breakdown Row:**</span><br />
-<ul>
-<li>Bar gauge showing capacity by host and pool</li>
-<li>Table with all pools: Host, Pool, Size, Used, Free, Capacity %, Health</li>
-</ul><br />
-<span>**Cluster-Wide ARC Statistics Row:**</span><br />
-<ul>
-<li>Average ARC Hit Rate gauge across all hosts</li>
-<li>ARC Hit Rate by Host time series</li>
-<li>Total ARC Size Across Cluster</li>
-<li>Total ARC Hits vs Misses (cluster-wide sum)</li>
-<li>ARC Size by Host</li>
-</ul><br />
-<span>**Dashboard Visualization:**</span><br />
-<br />
-<a href='./f3s-kubernetes-with-freebsd-part-8b/grafana-zfs-dashboard.png'><img alt='ZFS monitoring dashboard in Grafana showing pool statistics and ARC cache metrics' title='ZFS monitoring dashboard in Grafana showing pool statistics and ARC cache metrics' src='./f3s-kubernetes-with-freebsd-part-8b/grafana-zfs-dashboard.png' /></a><br />
-<br />
-<h3 style='display: inline' id='deployment'>Deployment</h3><br />
-<br />
-<span>Applied the resources to the cluster:</span><br />
-<br />
-<pre>
-cd /home/paul/git/conf/f3s/prometheus
-kubectl apply -f zfs-recording-rules.yaml
-kubectl apply -f zfs-dashboards.yaml
-</pre>
-<br />
-<span>Updated Justfile to include ZFS recording rules in install and upgrade targets:</span><br />
-<br />
-<pre>
-install:
- kubectl apply -f persistent-volumes.yaml
- kubectl create secret generic additional-scrape-configs --from-file=additional-scrape-configs.yaml -n monitoring --dry-run=client -o yaml | kubectl apply -f -
- helm install prometheus prometheus-community/kube-prometheus-stack --namespace monitoring -f persistence-values.yaml
- kubectl apply -f freebsd-recording-rules.yaml
- kubectl apply -f openbsd-recording-rules.yaml
- kubectl apply -f zfs-recording-rules.yaml
- just -f grafana-ingress/Justfile install
-</pre>
-<br />
-<h3 style='display: inline' id='verifying-zfs-metrics-in-prometheus'>Verifying ZFS Metrics in Prometheus</h3><br />
-<br />
-<span>Check that ZFS metrics are being collected:</span><br />
-<br />
-<pre>
-kubectl exec -n monitoring prometheus-prometheus-kube-prometheus-prometheus-0 -c prometheus -- \
- wget -qO- &#39;http://localhost:9090/api/v1/query?query=node_zfs_arcstats_size_bytes&#39;
-</pre>
-<br />
-<span>Check recording rules are calculating correctly:</span><br />
-<br />
-<pre>
-kubectl exec -n monitoring prometheus-prometheus-kube-prometheus-prometheus-0 -c prometheus -- \
- wget -qO- &#39;http://localhost:9090/api/v1/query?query=node_zfs_arc_memory_usage_percent&#39;
-</pre>
-<br />
-<span>Example output shows memory usage percentage for each FreeBSD server:</span><br />
-<br />
-<pre>
-"result":[
- {"metric":{"instance":"192.168.2.130:9100","os":"freebsd"},"value":[...,"37.58"]},
- {"metric":{"instance":"192.168.2.131:9100","os":"freebsd"},"value":[...,"12.85"]},
- {"metric":{"instance":"192.168.2.132:9100","os":"freebsd"},"value":[...,"13.44"]}
-]
-</pre>
-<br />
-<h3 style='display: inline' id='accessing-the-dashboards'>Accessing the Dashboards</h3><br />
-<br />
-<span>The dashboards are automatically imported by the Grafana sidecar and accessible at:</span><br />
-<br />
-<a class='textlink' href='https://grafana.f3s.buetow.org'>https://grafana.f3s.buetow.org</a><br />
-<br />
-<span>Navigate to Dashboards and search for:</span><br />
-<ul>
-<li>"FreeBSD ZFS" - detailed per-host view with pool and dataset breakdowns</li>
-<li>"FreeBSD ZFS Summary" - cluster-wide overview of all ZFS storage</li>
-</ul><br />
-<h3 style='display: inline' id='key-metrics-to-monitor'>Key Metrics to Monitor</h3><br />
-<br />
-<span>**ARC Hit Rate:** Should typically be above 90% for optimal performance. Lower hit rates indicate the ARC cache is too small or workload has poor locality.</span><br />
-<br />
-<span>**ARC Memory Usage:** Shows how much of the maximum ARC size is being used. If consistently at or near maximum, the ARC is effectively utilizing available memory.</span><br />
-<br />
-<span>**Data vs Metadata:** Typically data should dominate, but workloads with many small files will show higher metadata percentages.</span><br />
-<br />
-<span>**MRU vs MFU:** Most Recently Used vs Most Frequently Used cache. The ratio depends on workload characteristics.</span><br />
-<br />
-<span>**Pool Capacity:** Monitor pool usage to ensure adequate free space. ZFS performance degrades when pools exceed 80% capacity.</span><br />
-<br />
-<span>**Pool Health:** Should always show ONLINE (green). DEGRADED (yellow) indicates a disk issue requiring attention. FAULTED (red) requires immediate action.</span><br />
-<br />
-<span>**Dataset Usage:** Track which datasets are consuming the most space to identify growth trends and plan capacity.</span><br />
-<br />
-<h3 style='display: inline' id='zfs-pool-and-dataset-metrics-via-textfile-collector'>ZFS Pool and Dataset Metrics via Textfile Collector</h3><br />
-<br />
-<span>To complement the ARC statistics from node_exporter&#39;s built-in ZFS collector, I added pool capacity and dataset metrics using the textfile collector feature.</span><br />
-<br />
-<span>Created a script at /usr/local/bin/zfs_pool_metrics.sh on each FreeBSD server:</span><br />
-<br />
-<pre>
-#!/bin/sh
-# ZFS Pool and Dataset Metrics Collector for Prometheus
-
-OUTPUT_FILE="/var/tmp/node_exporter/zfs_pools.prom.$$"
-FINAL_FILE="/var/tmp/node_exporter/zfs_pools.prom"
-
-mkdir -p /var/tmp/node_exporter
-
-{
- # Pool metrics
- echo "# HELP zfs_pool_size_bytes Total size of ZFS pool"
- echo "# TYPE zfs_pool_size_bytes gauge"
- echo "# HELP zfs_pool_allocated_bytes Allocated space in ZFS pool"
- echo "# TYPE zfs_pool_allocated_bytes gauge"
- echo "# HELP zfs_pool_free_bytes Free space in ZFS pool"
- echo "# TYPE zfs_pool_free_bytes gauge"
- echo "# HELP zfs_pool_capacity_percent Capacity percentage"
- echo "# TYPE zfs_pool_capacity_percent gauge"
- echo "# HELP zfs_pool_health Pool health (0=ONLINE, 1=DEGRADED, 2=FAULTED)"
- echo "# TYPE zfs_pool_health gauge"
-
- zpool list -Hp -o name,size,allocated,free,capacity,health | \
- while IFS=$&#39;\t&#39; read name size alloc free cap health; do
- case "$health" in
- ONLINE) health_val=0 ;;
- DEGRADED) health_val=1 ;;
- FAULTED) health_val=2 ;;
- *) health_val=6 ;;
- esac
- cap_num=$(echo "$cap" | sed &#39;s/%//&#39;)
-
- echo "zfs_pool_size_bytes{pool=\"$name\"} $size"
- echo "zfs_pool_allocated_bytes{pool=\"$name\"} $alloc"
- echo "zfs_pool_free_bytes{pool=\"$name\"} $free"
- echo "zfs_pool_capacity_percent{pool=\"$name\"} $cap_num"
- echo "zfs_pool_health{pool=\"$name\"} $health_val"
- done
-
- # Dataset metrics
- echo "# HELP zfs_dataset_used_bytes Used space in dataset"
- echo "# TYPE zfs_dataset_used_bytes gauge"
- echo "# HELP zfs_dataset_available_bytes Available space"
- echo "# TYPE zfs_dataset_available_bytes gauge"
- echo "# HELP zfs_dataset_referenced_bytes Referenced space"
- echo "# TYPE zfs_dataset_referenced_bytes gauge"
-
- zfs list -Hp -t filesystem -o name,used,available,referenced | \
- while IFS=$&#39;\t&#39; read name used avail ref; do
- pool=$(echo "$name" | cut -d/ -f1)
- echo "zfs_dataset_used_bytes{pool=\"$pool\",dataset=\"$name\"} $used"
- echo "zfs_dataset_available_bytes{pool=\"$pool\",dataset=\"$name\"} $avail"
- echo "zfs_dataset_referenced_bytes{pool=\"$pool\",dataset=\"$name\"} $ref"
- done
-} &gt; "$OUTPUT_FILE"
-
-mv "$OUTPUT_FILE" "$FINAL_FILE"
-</pre>
-<br />
-<span>Deployed to all FreeBSD servers:</span><br />
-<br />
-<pre>
-for host in f0 f1 f2; do
- scp /tmp/zfs_pool_metrics.sh paul@$host:/tmp/
- ssh paul@$host &#39;doas mv /tmp/zfs_pool_metrics.sh /usr/local/bin/ &amp;&amp; \
- doas chmod +x /usr/local/bin/zfs_pool_metrics.sh&#39;
-done
-</pre>
-<br />
-<span>Set up cron jobs to run every minute:</span><br />
-<br />
-<pre>
-for host in f0 f1 f2; do
- ssh paul@$host &#39;echo "* * * * * /usr/local/bin/zfs_pool_metrics.sh &gt;/dev/null 2&gt;&amp;1" | \
- doas crontab -&#39;
-done
-</pre>
-<br />
-<span>The textfile collector (already configured with --collector.textfile.directory=/var/tmp/node_exporter) automatically picks up the metrics.</span><br />
-<br />
-<span>Verify metrics are being exposed:</span><br />
-<br />
-<pre>
-paul@f0:~ % curl -s http://localhost:9100/metrics | grep "^zfs_pool" | head -5
-zfs_pool_allocated_bytes{pool="zdata"} 6.47622733824e+11
-zfs_pool_allocated_bytes{pool="zroot"} 5.3338578944e+10
-zfs_pool_capacity_percent{pool="zdata"} 64
-zfs_pool_capacity_percent{pool="zroot"} 10
-zfs_pool_free_bytes{pool="zdata"} 3.48809678848e+11
-</pre>
-<br />
-<h2 style='display: inline' id='summary'>Summary</h2><br />
-<br />
-<span>Enhanced the f3s cluster observability by:</span><br />
-<br />
-<ul>
-<li>Enabling etcd metrics monitoring for the k3s embedded etcd</li>
-<li>Implementing comprehensive ZFS monitoring for FreeBSD storage servers</li>
-<li>Creating recording rules for calculated metrics (ARC hit rates, memory usage, etc.)</li>
-<li>Deploying Grafana dashboards for visualization</li>
-<li>Configuring automatic dashboard import via ConfigMap labels</li>
-</ul><br />
-<span>The monitoring stack now provides visibility into both cluster control plane health (etcd) and storage performance (ZFS).</span><br />
-<br />
-<a class='textlink' href='https://codeberg.org/snonux/conf/src/branch/master/f3s/prometheus'>prometheus configuration on Codeberg</a><br />
-<br />
-<h2 style='display: inline' id='distributed-tracing-with-grafana-tempo'>Distributed Tracing with Grafana Tempo</h2><br />
-<br />
-<span>After implementing logs (Loki) and metrics (Prometheus), the final pillar of observability is distributed tracing. Grafana Tempo provides distributed tracing capabilities that help understand request flows across microservices.</span><br />
-<br />
-<h3 style='display: inline' id='why-distributed-tracing'>Why Distributed Tracing?</h3><br />
-<br />
-<span>In a microservices architecture, a single user request may traverse multiple services. Distributed tracing:</span><br />
-<br />
-<ul>
-<li>Tracks requests across service boundaries</li>
-<li>Identifies performance bottlenecks</li>
-<li>Visualizes service dependencies</li>
-<li>Correlates with logs and metrics</li>
-<li>Helps debug complex distributed systems</li>
-</ul><br />
-<h3 style='display: inline' id='deploying-grafana-tempo'>Deploying Grafana Tempo</h3><br />
-<br />
-<span>Tempo is deployed in monolithic mode, following the same pattern as Loki&#39;s SingleBinary deployment.</span><br />
-<br />
-<span>#### Configuration Strategy</span><br />
-<br />
-<span>**Deployment Mode:** Monolithic (all components in one process)</span><br />
-<ul>
-<li>Simpler operation than microservices mode</li>
-<li>Suitable for the cluster scale</li>
-<li>Consistent with Loki deployment pattern</li>
-</ul><br />
-<span>**Storage:** Filesystem backend using hostPath</span><br />
-<ul>
-<li>10Gi storage at /data/nfs/k3svolumes/tempo/data</li>
-<li>7-day retention (168h)</li>
-<li>Local storage is the only option for monolithic mode</li>
-</ul><br />
-<span>**OTLP Receivers:** Standard OpenTelemetry Protocol ports</span><br />
-<ul>
-<li>gRPC: 4317</li>
-<li>HTTP: 4318</li>
-<li>Bind to 0.0.0.0 to avoid Tempo 2.7+ localhost-only binding issue</li>
-</ul><br />
-<span>#### Tempo Deployment Files</span><br />
-<br />
-<span>Created in /home/paul/git/conf/f3s/tempo/:</span><br />
-<br />
-<span>**values.yaml** - Helm chart configuration:</span><br />
-<br />
-<pre>
-tempo:
- retention: 168h
- storage:
- trace:
- backend: local
- local:
- path: /var/tempo/traces
- wal:
- path: /var/tempo/wal
- receivers:
- otlp:
- protocols:
- grpc:
- endpoint: 0.0.0.0:4317
- http:
- endpoint: 0.0.0.0:4318
-
-persistence:
- enabled: true
- size: 10Gi
- storageClassName: ""
-
-resources:
- limits:
- cpu: 1000m
- memory: 2Gi
- requests:
- cpu: 500m
- memory: 1Gi
-</pre>
-<br />
-<span>**persistent-volumes.yaml** - Storage configuration:</span><br />
-<br />
-<pre>
-apiVersion: v1
-kind: PersistentVolume
-metadata:
- name: tempo-data-pv
-spec:
- capacity:
- storage: 10Gi
- accessModes:
- - ReadWriteOnce
- persistentVolumeReclaimPolicy: Retain
- hostPath:
- path: /data/nfs/k3svolumes/tempo/data
----
-apiVersion: v1
-kind: PersistentVolumeClaim
-metadata:
- name: tempo-data-pvc
- namespace: monitoring
-spec:
- storageClassName: ""
- accessModes:
- - ReadWriteOnce
- resources:
- requests:
- storage: 10Gi
-</pre>
-<br />
-<span>**Grafana Datasource Provisioning**</span><br />
-<br />
-<span>All Grafana datasources (Prometheus, Alertmanager, Loki, Tempo) are provisioned via a unified ConfigMap that is directly mounted to the Grafana pod. This approach ensures datasources are loaded on startup without requiring sidecar-based discovery.</span><br />
-<br />
-<span>In /home/paul/git/conf/f3s/prometheus/grafana-datasources-all.yaml:</span><br />
-<br />
-<pre>
-apiVersion: v1
-kind: ConfigMap
-metadata:
- name: grafana-datasources-all
- namespace: monitoring
-data:
- datasources.yaml: |
- apiVersion: 1
- datasources:
- - name: Prometheus
- type: prometheus
- uid: prometheus
- url: http://prometheus-kube-prometheus-prometheus.monitoring:9090/
- access: proxy
- isDefault: true
- - name: Alertmanager
- type: alertmanager
- uid: alertmanager
- url: http://prometheus-kube-prometheus-alertmanager.monitoring:9093/
- - name: Loki
- type: loki
- uid: loki
- url: http://loki.monitoring.svc.cluster.local:3100
- - name: Tempo
- type: tempo
- uid: tempo
- url: http://tempo.monitoring.svc.cluster.local:3200
- jsonData:
- tracesToLogsV2:
- datasourceUid: loki
- spanStartTimeShift: -1h
- spanEndTimeShift: 1h
- tracesToMetrics:
- datasourceUid: prometheus
- serviceMap:
- datasourceUid: prometheus
- nodeGraph:
- enabled: true
-</pre>
-<br />
-<span>The kube-prometheus-stack Helm values (persistence-values.yaml) are configured to:</span><br />
-<ul>
-<li>Disable sidecar-based datasource provisioning</li>
-<li>Mount grafana-datasources-all ConfigMap directly to /etc/grafana/provisioning/datasources/</li>
-</ul><br />
-<span>This direct mounting approach is simpler and more reliable than sidecar-based discovery.</span><br />
-<br />
-<span>#### Installation</span><br />
-<br />
-<pre>
-cd /home/paul/git/conf/f3s/tempo
-just install
-</pre>
-<br />
-<span>Verify Tempo is running:</span><br />
-<br />
-<pre>
-kubectl get pods -n monitoring -l app.kubernetes.io/name=tempo
-kubectl exec -n monitoring &lt;tempo-pod&gt; -- wget -qO- http://localhost:3200/ready
-</pre>
-<br />
-<h3 style='display: inline' id='configuring-grafana-alloy-for-trace-collection'>Configuring Grafana Alloy for Trace Collection</h3><br />
-<br />
-<span>Updated /home/paul/git/conf/f3s/loki/alloy-values.yaml to add OTLP receivers for traces while maintaining existing log collection.</span><br />
-<br />
-<span>#### OTLP Receiver Configuration</span><br />
-<br />
-<span>Added to Alloy configuration after the log collection pipeline:</span><br />
-<br />
-<pre>
-// OTLP receiver for traces via gRPC and HTTP
-otelcol.receiver.otlp "default" {
- grpc {
- endpoint = "0.0.0.0:4317"
- }
- http {
- endpoint = "0.0.0.0:4318"
- }
- output {
- traces = [otelcol.processor.batch.default.input]
- }
-}
-
-// Batch processor for efficient trace forwarding
-otelcol.processor.batch "default" {
- timeout = "5s"
- send_batch_size = 100
- send_batch_max_size = 200
- output {
- traces = [otelcol.exporter.otlp.tempo.input]
- }
-}
-
-// OTLP exporter to send traces to Tempo
-otelcol.exporter.otlp "tempo" {
- client {
- endpoint = "tempo.monitoring.svc.cluster.local:4317"
- tls {
- insecure = true
- }
- compression = "gzip"
- }
-}
-</pre>
-<br />
-<span>The batch processor reduces network overhead by accumulating spans before forwarding to Tempo.</span><br />
-<br />
-<span>#### Upgrade Alloy</span><br />
-<br />
-<pre>
-cd /home/paul/git/conf/f3s/loki
-just upgrade
-</pre>
-<br />
-<span>Verify OTLP receivers are listening:</span><br />
-<br />
-<pre>
-kubectl logs -n monitoring -l app.kubernetes.io/name=alloy | grep -i "otlp.*receiver"
-kubectl exec -n monitoring &lt;alloy-pod&gt; -- netstat -ln | grep -E &#39;:(4317|4318)&#39;
-</pre>
-<br />
-<h3 style='display: inline' id='demo-tracing-application'>Demo Tracing Application</h3><br />
-<br />
-<span>Created a three-tier Python application to demonstrate distributed tracing in action.</span><br />
-<br />
-<span>#### Application Architecture</span><br />
-<br />
-<pre>
-User → Frontend (Flask:5000) → Middleware (Flask:5001) → Backend (Flask:5002)
- ↓ ↓ ↓
- Alloy (OTLP:4317) → Tempo → Grafana
-</pre>
-<br />
-<span>**Frontend Service:**</span><br />
-<ul>
-<li>Receives HTTP requests at /api/process</li>
-<li>Forwards to middleware service</li>
-<li>Creates parent span for the entire request</li>
-</ul><br />
-<span>**Middleware Service:**</span><br />
-<ul>
-<li>Transforms data at /api/transform</li>
-<li>Calls backend service</li>
-<li>Creates child span linked to frontend</li>
-</ul><br />
-<span>**Backend Service:**</span><br />
-<ul>
-<li>Returns data at /api/data</li>
-<li>Simulates database query (100ms sleep)</li>
-<li>Creates leaf span in the trace</li>
-</ul><br />
-<span>#### OpenTelemetry Instrumentation</span><br />
-<br />
-<span>All services use Python OpenTelemetry libraries:</span><br />
-<br />
-<span>**Dependencies:**</span><br />
-<pre>
-flask==3.0.0
-requests==2.31.0
-opentelemetry-distro==0.49b0
-opentelemetry-exporter-otlp==1.28.0
-opentelemetry-instrumentation-flask==0.49b0
-opentelemetry-instrumentation-requests==0.49b0
-</pre>
-<br />
-<span>**Auto-instrumentation pattern** (used in all services):</span><br />
-<br />
-<!-- Generator: GNU source-highlight 3.1.9
-by Lorenzo Bettini
-http://www.lorenzobettini.it
-http://www.gnu.org/software/src-highlite -->
-<pre><font color="#ababab">from</font><font color="#ff0000"> opentelemetry </font><font color="#ababab">import</font><font color="#ff0000"> trace</font>
-<font color="#ababab">from</font><font color="#ff0000"> opentelemetry</font><font color="#F3E651">.</font><font color="#ff0000">sdk</font><font color="#F3E651">.</font><font color="#ff0000">trace </font><font color="#ababab">import</font><font color="#ff0000"> TracerProvider</font>
-<font color="#ababab">from</font><font color="#ff0000"> opentelemetry</font><font color="#F3E651">.</font><font color="#ff0000">exporter</font><font color="#F3E651">.</font><font color="#ff0000">otlp</font><font color="#F3E651">.</font><font color="#ff0000">proto</font><font color="#F3E651">.</font><font color="#ff0000">grpc</font><font color="#F3E651">.</font><font color="#ff0000">trace_exporter </font><font color="#ababab">import</font><font color="#ff0000"> OTLPSpanExporter</font>
-<font color="#ababab">from</font><font color="#ff0000"> opentelemetry</font><font color="#F3E651">.</font><font color="#ff0000">instrumentation</font><font color="#F3E651">.</font><font color="#ff0000">flask </font><font color="#ababab">import</font><font color="#ff0000"> FlaskInstrumentor</font>
-<font color="#ababab">from</font><font color="#ff0000"> opentelemetry</font><font color="#F3E651">.</font><font color="#ff0000">instrumentation</font><font color="#F3E651">.</font><font color="#ff0000">requests </font><font color="#ababab">import</font><font color="#ff0000"> RequestsInstrumentor</font>
-<font color="#ababab">from</font><font color="#ff0000"> opentelemetry</font><font color="#F3E651">.</font><font color="#ff0000">sdk</font><font color="#F3E651">.</font><font color="#ff0000">resources </font><font color="#ababab">import</font><font color="#ff0000"> Resource</font>
-
-<i><font color="#ababab"># Define service identity</font></i>
-<font color="#ff0000">resource </font><font color="#F3E651">=</font><font color="#ff0000"> </font><font color="#7bc710">Resource</font><font color="#F3E651">(</font><font color="#ff0000">attributes</font><font color="#F3E651">={</font>
-<font color="#ff0000"> </font><font color="#bb00ff">"service.name"</font><font color="#F3E651">:</font><font color="#ff0000"> </font><font color="#bb00ff">"frontend"</font><font color="#F3E651">,</font>
-<font color="#ff0000"> </font><font color="#bb00ff">"service.namespace"</font><font color="#F3E651">:</font><font color="#ff0000"> </font><font color="#bb00ff">"tracing-demo"</font><font color="#F3E651">,</font>
-<font color="#ff0000"> </font><font color="#bb00ff">"service.version"</font><font color="#F3E651">:</font><font color="#ff0000"> </font><font color="#bb00ff">"1.0.0"</font>
-<font color="#F3E651">})</font>
-
-<font color="#ff0000">provider </font><font color="#F3E651">=</font><font color="#ff0000"> </font><font color="#7bc710">TracerProvider</font><font color="#F3E651">(</font><font color="#ff0000">resource</font><font color="#F3E651">=</font><font color="#ff0000">resource</font><font color="#F3E651">)</font>
-
-<i><font color="#ababab"># Export to Alloy</font></i>
-<font color="#ff0000">otlp_exporter </font><font color="#F3E651">=</font><font color="#ff0000"> </font><font color="#7bc710">OTLPSpanExporter</font><font color="#F3E651">(</font>
-<font color="#ff0000"> endpoint</font><font color="#F3E651">=</font><font color="#bb00ff">"http://alloy.monitoring.svc.cluster.local:4317"</font><font color="#F3E651">,</font>
-<font color="#ff0000"> insecure</font><font color="#F3E651">=</font><font color="#ff0000">True</font>
-<font color="#F3E651">)</font>
-
-<font color="#ff0000">processor </font><font color="#F3E651">=</font><font color="#ff0000"> </font><font color="#7bc710">BatchSpanProcessor</font><font color="#F3E651">(</font><font color="#ff0000">otlp_exporter</font><font color="#F3E651">)</font>
-<font color="#ff0000">provider</font><font color="#F3E651">.</font><font color="#7bc710">add_span_processor</font><font color="#F3E651">(</font><font color="#ff0000">processor</font><font color="#F3E651">)</font>
-<font color="#ff0000">trace</font><font color="#F3E651">.</font><font color="#7bc710">set_tracer_provider</font><font color="#F3E651">(</font><font color="#ff0000">provider</font><font color="#F3E651">)</font>
-
-<i><font color="#ababab"># Auto-instrument Flask and requests</font></i>
-<font color="#7bc710">FlaskInstrumentor</font><font color="#F3E651">().</font><font color="#7bc710">instrument_app</font><font color="#F3E651">(</font><font color="#ff0000">app</font><font color="#F3E651">)</font>
-<font color="#7bc710">RequestsInstrumentor</font><font color="#F3E651">().</font><font color="#7bc710">instrument</font><font color="#F3E651">()</font>
-</pre>
-<br />
-<span>The auto-instrumentation automatically:</span><br />
-<ul>
-<li>Creates spans for HTTP requests</li>
-<li>Propagates trace context via W3C Trace Context headers</li>
-<li>Links parent and child spans across service boundaries</li>
-</ul><br />
-<span>#### Deployment</span><br />
-<br />
-<span>Created Helm chart in /home/paul/git/conf/f3s/tracing-demo/ with three separate deployments, services, and an ingress.</span><br />
-<br />
-<span>Build and deploy:</span><br />
-<br />
-<pre>
-cd /home/paul/git/conf/f3s/tracing-demo
-just build
-just import
-just install
-</pre>
-<br />
-<span>Verify deployment:</span><br />
-<br />
-<pre>
-kubectl get pods -n services | grep tracing-demo
-kubectl get ingress -n services tracing-demo-ingress
-</pre>
-<br />
-<span>Access the application at:</span><br />
-<br />
-<a class='textlink' href='http://tracing-demo.f3s.buetow.org'>http://tracing-demo.f3s.buetow.org</a><br />
-<br />
-<h3 style='display: inline' id='visualizing-traces-in-grafana'>Visualizing Traces in Grafana</h3><br />
-<br />
-<span>The Tempo datasource is automatically discovered by Grafana through the ConfigMap label.</span><br />
-<br />
-<span>#### Accessing Traces</span><br />
-<br />
-<span>Navigate to Grafana → Explore → Select "Tempo" datasource</span><br />
-<br />
-<span>**Search Interface:**</span><br />
-<ul>
-<li>Search by Trace ID</li>
-<li>Search by service name</li>
-<li>Search by tags</li>
-</ul><br />
-<span>**TraceQL Queries:**</span><br />
-<br />
-<span>Find all traces from demo app:</span><br />
-<pre>
-{ resource.service.namespace = "tracing-demo" }
-</pre>
-<br />
-<span>Find slow requests (&gt;200ms):</span><br />
-<pre>
-{ duration &gt; 200ms }
-</pre>
-<br />
-<span>Find traces from specific service:</span><br />
-<pre>
-{ resource.service.name = "frontend" }
-</pre>
-<br />
-<span>Find errors:</span><br />
-<pre>
-{ status = error }
-</pre>
-<br />
-<span>Complex query - frontend traces calling middleware:</span><br />
-<pre>
-{ resource.service.namespace = "tracing-demo" } &amp;&amp; { span.http.status_code &gt;= 500 }
-</pre>
-<br />
-<span>#### Service Graph Visualization</span><br />
-<br />
-<span>The service graph shows visual connections between services:</span><br />
-<br />
-<span>1. Navigate to Explore → Tempo</span><br />
-<span>2. Enable "Service Graph" view</span><br />
-<span>3. Shows: Frontend → Middleware → Backend with request rates</span><br />
-<br />
-<span>The service graph uses Prometheus metrics generated from trace data.</span><br />
-<br />
-<h3 style='display: inline' id='correlation-between-observability-signals'>Correlation Between Observability Signals</h3><br />
-<br />
-<span>Tempo integrates with Loki and Prometheus to provide unified observability.</span><br />
-<br />
-<span>#### Traces-to-Logs</span><br />
-<br />
-<span>Click on any span in a trace to see related logs:</span><br />
-<br />
-<span>1. View trace in Grafana</span><br />
-<span>2. Click on a span</span><br />
-<span>3. Select "Logs for this span"</span><br />
-<span>4. Loki shows logs filtered by:</span><br />
-<span> * Time range (span duration ± 1 hour)</span><br />
-<span> * Service name</span><br />
-<span> * Namespace</span><br />
-<span> * Pod</span><br />
-<br />
-<span>This helps correlate what the service was doing when the span was created.</span><br />
-<br />
-<span>#### Traces-to-Metrics</span><br />
-<br />
-<span>View Prometheus metrics for services in the trace:</span><br />
-<br />
-<span>1. View trace in Grafana</span><br />
-<span>2. Select "Metrics" tab</span><br />
-<span>3. Shows metrics like:</span><br />
-<span> * Request rate</span><br />
-<span> * Error rate</span><br />
-<span> * Duration percentiles</span><br />
-<br />
-<span>#### Logs-to-Traces</span><br />
-<br />
-<span>From logs, you can jump to related traces:</span><br />
-<br />
-<span>1. In Loki, logs that contain trace IDs are automatically linked</span><br />
-<span>2. Click the trace ID to view the full trace</span><br />
-<span>3. See the complete request flow</span><br />
-<br />
-<h3 style='display: inline' id='generating-traces-for-testing'>Generating Traces for Testing</h3><br />
-<br />
-<span>Test the demo application:</span><br />
-<br />
-<pre>
-curl http://tracing-demo.f3s.buetow.org/api/process
-</pre>
-<br />
-<span>Load test (generates 50 traces):</span><br />
-<br />
-<pre>
-cd /home/paul/git/conf/f3s/tracing-demo
-just load-test
-</pre>
-<br />
-<span>Each request creates a distributed trace spanning all three services.</span><br />
-<br />
-<h3 style='display: inline' id='verifying-the-complete-pipeline'>Verifying the Complete Pipeline</h3><br />
-<br />
-<span>Check the trace flow end-to-end:</span><br />
-<br />
-<span>**1. Application generates traces:**</span><br />
-<pre>
-kubectl logs -n services -l app=tracing-demo-frontend | grep -i trace
-</pre>
-<br />
-<span>**2. Alloy receives traces:**</span><br />
-<pre>
-kubectl logs -n monitoring -l app.kubernetes.io/name=alloy | grep -i otlp
-</pre>
-<br />
-<span>**3. Tempo stores traces:**</span><br />
-<pre>
-kubectl logs -n monitoring -l app.kubernetes.io/name=tempo | grep -i trace
-</pre>
-<br />
-<span>**4. Grafana displays traces:**</span><br />
-<span>Navigate to Explore → Tempo → Search for traces</span><br />
-<br />
-<h3 style='display: inline' id='practical-example-viewing-a-distributed-trace'>Practical Example: Viewing a Distributed Trace</h3><br />
-<br />
-<span>Let&#39;s generate a trace and examine it in Grafana.</span><br />
-<br />
-<span>**1. Generate a trace by calling the demo application:**</span><br />
-<br />
-<pre>
-curl -H "Host: tracing-demo.f3s.buetow.org" http://r0/api/process
-</pre>
-<br />
-<span>**Response (HTTP 200):**</span><br />
-<br />
-<!-- Generator: GNU source-highlight 3.1.9
-by Lorenzo Bettini
-http://www.lorenzobettini.it
-http://www.gnu.org/software/src-highlite -->
-<pre><font color="#F3E651">{</font>
-<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">middleware_response</font><font color="#ff0000">"</font><font color="#ff0000">: </font><font color="#F3E651">{</font>
-<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">backend_data</font><font color="#ff0000">"</font><font color="#ff0000">: </font><font color="#F3E651">{</font>
-<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">data</font><font color="#ff0000">"</font><font color="#ff0000">: </font><font color="#F3E651">{</font>
-<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">id</font><font color="#ff0000">"</font><font color="#ff0000">: </font><font color="#bb00ff">12345</font><font color="#F3E651">,</font>
-<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">query_time_ms</font><font color="#ff0000">"</font><font color="#ff0000">: </font><font color="#bb00ff">100.0</font><font color="#F3E651">,</font>
-<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">timestamp</font><font color="#ff0000">"</font><font color="#ff0000">:</font><font color="#ff0000"> "</font><font color="#bb00ff">2025-12-28T18:35:01.064538</font><font color="#ff0000">"</font><font color="#F3E651">,</font>
-<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">value</font><font color="#ff0000">"</font><font color="#ff0000">:</font><font color="#ff0000"> "</font><font color="#bb00ff">Sample data from backend service</font><font color="#ff0000">"</font>
-<font color="#ff0000"> </font><font color="#F3E651">},</font>
-<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">service</font><font color="#ff0000">"</font><font color="#ff0000">:</font><font color="#ff0000"> "</font><font color="#bb00ff">backend</font><font color="#ff0000">"</font>
-<font color="#ff0000"> </font><font color="#F3E651">},</font>
-<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">middleware_processed</font><font color="#ff0000">"</font><font color="#ff0000">: </font><b><font color="#ffffff">true</font></b><font color="#F3E651">,</font>
-<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">original_data</font><font color="#ff0000">"</font><font color="#ff0000">: </font><font color="#F3E651">{</font>
-<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">source</font><font color="#ff0000">"</font><font color="#ff0000">:</font><font color="#ff0000"> "</font><font color="#bb00ff">GET request</font><font color="#ff0000">"</font>
-<font color="#ff0000"> </font><font color="#F3E651">},</font>
-<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">transformation_time_ms</font><font color="#ff0000">"</font><font color="#ff0000">: </font><font color="#bb00ff">50</font>
-<font color="#ff0000"> </font><font color="#F3E651">},</font>
-<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">request_data</font><font color="#ff0000">"</font><font color="#ff0000">: </font><font color="#F3E651">{</font>
-<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">source</font><font color="#ff0000">"</font><font color="#ff0000">:</font><font color="#ff0000"> "</font><font color="#bb00ff">GET request</font><font color="#ff0000">"</font>
-<font color="#ff0000"> </font><font color="#F3E651">},</font>
-<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">service</font><font color="#ff0000">"</font><font color="#ff0000">:</font><font color="#ff0000"> "</font><font color="#bb00ff">frontend</font><font color="#ff0000">"</font><font color="#F3E651">,</font>
-<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">status</font><font color="#ff0000">"</font><font color="#ff0000">:</font><font color="#ff0000"> "</font><font color="#bb00ff">success</font><font color="#ff0000">"</font>
-<font color="#F3E651">}</font>
-</pre>
-<br />
-<span>**2. Find the trace in Tempo via API:**</span><br />
-<br />
-<span>After a few seconds (for batch export), search for recent traces:</span><br />
-<br />
-<pre>
-kubectl exec -n monitoring tempo-0 -- wget -qO- \
- &#39;http://localhost:3200/api/search?tags=service.namespace%3Dtracing-demo&amp;limit=5&#39; 2&gt;/dev/null | \
- python3 -m json.tool
-</pre>
-<br />
-<span>Returns traces including:</span><br />
-<br />
-<!-- Generator: GNU source-highlight 3.1.9
-by Lorenzo Bettini
-http://www.lorenzobettini.it
-http://www.gnu.org/software/src-highlite -->
-<pre><font color="#F3E651">{</font>
-<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">traceID</font><font color="#ff0000">"</font><font color="#ff0000">:</font><font color="#ff0000"> "</font><font color="#bb00ff">4be1151c0bdcd5625ac7e02b98d95bd5</font><font color="#ff0000">"</font><font color="#F3E651">,</font>
-<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">rootServiceName</font><font color="#ff0000">"</font><font color="#ff0000">:</font><font color="#ff0000"> "</font><font color="#bb00ff">frontend</font><font color="#ff0000">"</font><font color="#F3E651">,</font>
-<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">rootTraceName</font><font color="#ff0000">"</font><font color="#ff0000">:</font><font color="#ff0000"> "</font><font color="#bb00ff">GET /api/process</font><font color="#ff0000">"</font><font color="#F3E651">,</font>
-<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">durationMs</font><font color="#ff0000">"</font><font color="#ff0000">: </font><font color="#bb00ff">221</font>
-<font color="#F3E651">}</font>
-</pre>
-<br />
-<span>**3. Fetch complete trace details:**</span><br />
-<br />
-<pre>
-kubectl exec -n monitoring tempo-0 -- wget -qO- \
- &#39;http://localhost:3200/api/traces/4be1151c0bdcd5625ac7e02b98d95bd5&#39; 2&gt;/dev/null | \
- python3 -m json.tool
-</pre>
-<br />
-<span>**Trace structure (8 spans across 3 services):**</span><br />
-<br />
-<pre>
-Trace ID: 4be1151c0bdcd5625ac7e02b98d95bd5
-Services: 3 (frontend, middleware, backend)
-
-Service: frontend
- └─ GET /api/process 221.10ms (HTTP server span)
- └─ frontend-process 216.23ms (custom business logic span)
- └─ POST 209.97ms (HTTP client span to middleware)
-
-Service: middleware
- └─ POST /api/transform 186.02ms (HTTP server span)
- └─ middleware-transform 180.96ms (custom business logic span)
- └─ GET 127.52ms (HTTP client span to backend)
-
-Service: backend
- └─ GET /api/data 103.93ms (HTTP server span)
- └─ backend-get-data 102.11ms (custom business logic span with 100ms sleep)
-</pre>
-<br />
-<span>**4. View the trace in Grafana UI:**</span><br />
-<br />
-<span>Navigate to: Grafana → Explore → Tempo datasource</span><br />
-<br />
-<span>Search using TraceQL:</span><br />
-<pre>
-{ resource.service.namespace = "tracing-demo" }
-</pre>
-<br />
-<span>Or directly open the trace by pasting the trace ID in the search box:</span><br />
-<pre>
-4be1151c0bdcd5625ac7e02b98d95bd5
-</pre>
-<br />
-<span>**5. Trace visualization:**</span><br />
-<br />
-<span>The trace waterfall view in Grafana shows the complete request flow with timing:</span><br />
-<br />
-<a href='./f3s-kubernetes-with-freebsd-part-8b/grafana-tempo-trace.png'><img alt='Distributed trace visualization in Grafana Tempo showing Frontend → Middleware → Backend spans' title='Distributed trace visualization in Grafana Tempo showing Frontend → Middleware → Backend spans' src='./f3s-kubernetes-with-freebsd-part-8b/grafana-tempo-trace.png' /></a><br />
-<br />
-<span>For additional examples of Tempo trace visualization, see also:</span><br />
-<br />
-<a class='textlink' href='https://foo.zone/gemfeed/2025-12-24-x-rag-observability-hackathon.html'>X-RAG Observability Hackathon (more Grafana Tempo screenshots)</a><br />
-<br />
-<span>The trace reveals the distributed request flow:</span><br />
-<ul>
-<li>**Frontend (221ms)**: Receives GET /api/process, executes business logic, calls middleware</li>
-<li>**Middleware (186ms)**: Receives POST /api/transform, transforms data, calls backend</li>
-<li>**Backend (104ms)**: Receives GET /api/data, simulates database query with 100ms sleep</li>
-<li>**Total request time**: 221ms end-to-end</li>
-<li>**Span propagation**: W3C Trace Context headers automatically link all spans</li>
-</ul><br />
-<span>**6. Service graph visualization:**</span><br />
-<br />
-<span>The service graph is automatically generated from traces and shows service dependencies. For examples of service graph visualization in Grafana, see the screenshots in the X-RAG Observability Hackathon blog post.</span><br />
-<br />
-<a class='textlink' href='https://foo.zone/gemfeed/2025-12-24-x-rag-observability-hackathon.html'>X-RAG Observability Hackathon (includes service graph screenshots)</a><br />
-<br />
-<span>This visualization helps identify:</span><br />
-<ul>
-<li>Request rates between services</li>
-<li>Average latency for each hop</li>
-<li>Error rates (if any)</li>
-<li>Service dependencies and communication patterns</li>
-</ul><br />
-<h3 style='display: inline' id='storage-and-retention'>Storage and Retention</h3><br />
-<br />
-<span>Monitor Tempo storage usage:</span><br />
-<br />
-<pre>
-kubectl exec -n monitoring &lt;tempo-pod&gt; -- df -h /var/tempo
-</pre>
-<br />
-<span>With 10Gi storage and 7-day retention, the system handles moderate trace volumes. If storage fills up:</span><br />
-<br />
-<ul>
-<li>Reduce retention to 72h (3 days)</li>
-<li>Implement sampling in Alloy</li>
-<li>Increase PV size</li>
-</ul><br />
-<h3 style='display: inline' id='complete-observability-stack'>Complete Observability Stack</h3><br />
-<br />
-<span>The f3s cluster now has complete observability:</span><br />
-<br />
-<span>**Metrics** (Prometheus):</span><br />
-<ul>
-<li>Cluster resource usage</li>
-<li>Application metrics</li>
-<li>Node metrics (FreeBSD ZFS, OpenBSD edge)</li>
-<li>etcd health</li>
-</ul><br />
-<span>**Logs** (Loki):</span><br />
-<ul>
-<li>All pod logs</li>
-<li>Structured log collection</li>
-<li>Log aggregation and search</li>
-</ul><br />
-<span>**Traces** (Tempo):</span><br />
-<ul>
-<li>Distributed request tracing</li>
-<li>Service dependency mapping</li>
-<li>Performance profiling</li>
-<li>Error tracking</li>
-</ul><br />
-<span>**Visualization** (Grafana):</span><br />
-<ul>
-<li>Unified dashboards</li>
-<li>Correlation between metrics, logs, and traces</li>
-<li>Service graphs</li>
-<li>Alerts</li>
-</ul><br />
-<h3 style='display: inline' id='configuration-files'>Configuration Files</h3><br />
-<br />
-<span>All configuration files are available on Codeberg:</span><br />
-<br />
-<a class='textlink' href='https://codeberg.org/snonux/conf/src/branch/master/f3s/tempo'>Tempo configuration</a><br />
-<a class='textlink' href='https://codeberg.org/snonux/conf/src/branch/master/f3s/loki'>Alloy configuration (updated for traces)</a><br />
-<a class='textlink' href='https://codeberg.org/snonux/conf/src/branch/master/f3s/tracing-demo'>Demo tracing application</a><br />
-<p class="footer">
- Generated with <a href="https://codeberg.org/snonux/gemtexter">Gemtexter 3.0.1-develop</a> |
- served by <a href="https://www.OpenBSD.org">OpenBSD</a>/<a href="https://man.openbsd.org/relayd.8">relayd(8)</a>+<a href="https://man.openbsd.org/httpd.8">httpd(8)</a> |
- <a href="https://foo.zone/site-mirrors.html">Site Mirrors</a>
- <br />
- Webring: <a href="https://shring.sh/foo.zone/previous">previous</a> | <a href="https://shring.sh">shring</a> | <a href="https://shring.sh/foo.zone/next">next</a>
-</p>
-</body>
-</html>
diff --git a/gemfeed/atom.xml b/gemfeed/atom.xml
index d1d5cca5..74a73066 100644
--- a/gemfeed/atom.xml
+++ b/gemfeed/atom.xml
@@ -1,6 +1,6 @@
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
- <updated>2026-03-03T09:08:49+02:00</updated>
+ <updated>2026-03-09T09:06:40+02:00</updated>
<title>foo.zone feed</title>
<subtitle>To be in the .zone!</subtitle>
<link href="https://foo.zone/gemfeed/atom.xml" rel="self" />
@@ -3521,7 +3521,7 @@ $ curl -s -G "http://localhost:3200/api/search" \
<title>f3s: Kubernetes with FreeBSD - Part 8: Observability</title>
<link href="https://foo.zone/gemfeed/2025-12-07-f3s-kubernetes-with-freebsd-part-8.html" />
<id>https://foo.zone/gemfeed/2025-12-07-f3s-kubernetes-with-freebsd-part-8.html</id>
- <updated>2025-12-06T23:58:24+02:00</updated>
+ <updated>2025-12-06T23:58:24+02:00, last updated Mon 09 Mar 09:33:08 EET 2026</updated>
<author>
<name>Paul Buetow aka snonux</name>
<email>paul@dev.buetow.org</email>
@@ -3531,7 +3531,7 @@ $ curl -s -G "http://localhost:3200/api/search" \
<div xmlns="http://www.w3.org/1999/xhtml">
<h1 style='display: inline' id='f3s-kubernetes-with-freebsd---part-8-observability'>f3s: Kubernetes with FreeBSD - Part 8: Observability</h1><br />
<br />
-<span class='quote'>Published at 2025-12-06T23:58:24+02:00</span><br />
+<span class='quote'>Published at 2025-12-06T23:58:24+02:00, last updated Mon 09 Mar 09:33:08 EET 2026</span><br />
<br />
<span>This is the 8th blog post about the f3s series for my self-hosting demands in a home lab. f3s? The "f" stands for FreeBSD, and the "3s" stands for k3s, the Kubernetes distribution I use on FreeBSD-based physical machines.</span><br />
<br />
@@ -3573,23 +3573,56 @@ $ curl -s -G "http://localhost:3200/api/search" \
<li>⇢ ⇢ <a href='#adding-freebsd-hosts-to-prometheus'>Adding FreeBSD hosts to Prometheus</a></li>
<li>⇢ ⇢ <a href='#freebsd-memory-metrics-compatibility'>FreeBSD memory metrics compatibility</a></li>
<li>⇢ ⇢ <a href='#disk-io-metrics-limitation'>Disk I/O metrics limitation</a></li>
+<li>⇢ <a href='#zfs-monitoring-for-freebsd-servers'>ZFS Monitoring for FreeBSD Servers</a></li>
+<li>⇢ ⇢ <a href='#node-exporter-zfs-collector'>Node Exporter ZFS Collector</a></li>
+<li>⇢ ⇢ <a href='#verifying-zfs-metrics'>Verifying ZFS Metrics</a></li>
+<li>⇢ ⇢ <a href='#zfs-recording-rules'>ZFS Recording Rules</a></li>
+<li>⇢ ⇢ <a href='#grafana-dashboards'>Grafana Dashboards</a></li>
+<li>⇢ ⇢ <a href='#deployment'>Deployment</a></li>
+<li>⇢ ⇢ <a href='#verifying-zfs-metrics-in-prometheus'>Verifying ZFS Metrics in Prometheus</a></li>
+<li>⇢ ⇢ <a href='#key-metrics-to-monitor'>Key Metrics to Monitor</a></li>
+<li>⇢ ⇢ <a href='#zfs-pool-and-dataset-metrics-via-textfile-collector'>ZFS Pool and Dataset Metrics via Textfile Collector</a></li>
<li>⇢ <a href='#monitoring-external-openbsd-hosts'>Monitoring external OpenBSD hosts</a></li>
<li>⇢ ⇢ <a href='#installing-node-exporter-on-openbsd'>Installing Node Exporter on OpenBSD</a></li>
<li>⇢ ⇢ <a href='#adding-openbsd-hosts-to-prometheus'>Adding OpenBSD hosts to Prometheus</a></li>
<li>⇢ ⇢ <a href='#openbsd-memory-metrics-compatibility'>OpenBSD memory metrics compatibility</a></li>
+<li>⇢ <a href='#distributed-tracing-with-grafana-tempo'>Distributed Tracing with Grafana Tempo</a></li>
+<li>⇢ ⇢ <a href='#why-distributed-tracing'>Why Distributed Tracing?</a></li>
+<li>⇢ ⇢ <a href='#deploying-grafana-tempo'>Deploying Grafana Tempo</a></li>
+<li>⇢ <a href='#-configuration-strategy'>⇢# Configuration Strategy</a></li>
+<li>⇢ <a href='#-tempo-deployment-files'>⇢# Tempo Deployment Files</a></li>
+<li>⇢ <a href='#-installation'>⇢# Installation</a></li>
+<li>⇢ ⇢ <a href='#configuring-grafana-alloy-for-trace-collection'>Configuring Grafana Alloy for Trace Collection</a></li>
+<li>⇢ <a href='#-otlp-receiver-configuration'>⇢# OTLP Receiver Configuration</a></li>
+<li>⇢ <a href='#-upgrade-alloy'>⇢# Upgrade Alloy</a></li>
+<li>⇢ ⇢ <a href='#demo-tracing-application'>Demo Tracing Application</a></li>
+<li>⇢ <a href='#-application-architecture'>⇢# Application Architecture</a></li>
+<li>⇢ ⇢ <a href='#visualizing-traces-in-grafana'>Visualizing Traces in Grafana</a></li>
+<li>⇢ <a href='#-accessing-traces'>⇢# Accessing Traces</a></li>
+<li>⇢ <a href='#-service-graph-visualization'>⇢# Service Graph Visualization</a></li>
+<li>⇢ ⇢ <a href='#correlation-between-observability-signals'>Correlation Between Observability Signals</a></li>
+<li>⇢ <a href='#-traces-to-logs'>⇢# Traces-to-Logs</a></li>
+<li>⇢ <a href='#-traces-to-metrics'>⇢# Traces-to-Metrics</a></li>
+<li>⇢ <a href='#-logs-to-traces'>⇢# Logs-to-Traces</a></li>
+<li>⇢ ⇢ <a href='#generating-traces-for-testing'>Generating Traces for Testing</a></li>
+<li>⇢ ⇢ <a href='#verifying-the-complete-pipeline'>Verifying the Complete Pipeline</a></li>
+<li>⇢ ⇢ <a href='#practical-example-viewing-a-distributed-trace'>Practical Example: Viewing a Distributed Trace</a></li>
+<li>⇢ ⇢ <a href='#storage-and-retention'>Storage and Retention</a></li>
+<li>⇢ ⇢ <a href='#configuration-files'>Configuration Files</a></li>
<li>⇢ <a href='#summary'>Summary</a></li>
</ul><br />
<h2 style='display: inline' id='introduction'>Introduction</h2><br />
<br />
-<span>In this blog post, I set up a complete observability stack for the k3s cluster. Observability is crucial for understanding what&#39;s happening inside the cluster—whether its tracking resource usage, debugging issues, or analysing application behaviour. The stack consists of four main components, all deployed into the <span class='inlinecode'>monitoring</span> namespace:</span><br />
+<span>In this blog post, I set up a complete observability stack for the k3s cluster. Observability is crucial for understanding what&#39;s happening inside the cluster—whether its tracking resource usage, debugging issues, or analysing application behaviour. The stack consists of five main components, all deployed into the <span class='inlinecode'>monitoring</span> namespace:</span><br />
<br />
<ul>
<li>Prometheus: time-series database for metrics collection and alerting</li>
<li>Grafana: visualisation and dashboarding frontend</li>
<li>Loki: log aggregation system (like Prometheus, but for logs)</li>
-<li>Alloy: telemetry collector that ships logs from all pods to Loki</li>
+<li>Alloy: telemetry collector that ships logs and traces from all pods to Loki and Tempo</li>
+<li>Tempo: distributed tracing backend for request flow analysis across microservices</li>
</ul><br />
-<span>Together, these form the "PLG" stack (Prometheus, Loki, Grafana), which is a popular open-source alternative to commercial observability platforms.</span><br />
+<span>Together, these form the "PLG" stack (Prometheus, Loki, Grafana) extended with Tempo for distributed tracing, which is a popular open-source alternative to commercial observability platforms.</span><br />
<br />
<span>All manifests for the f3s stack live in my configuration repository:</span><br />
<br />
@@ -3605,10 +3638,10 @@ $ curl -s -G "http://localhost:3200/api/search" \
by Lorenzo Bettini
http://www.lorenzobettini.it
http://www.gnu.org/software/src-highlite -->
-<pre>$ git clone https://codeberg.org/snonux/conf.git
-$ cd conf
-$ git checkout 15a86f3 <i><font color="silver"># Last commit before ArgoCD migration</font></i>
-$ cd f3s/prometheus/
+<pre><font color="#ff0000">$ git clone https</font><font color="#F3E651">:</font><font color="#ff0000">//codeberg</font><font color="#F3E651">.</font><font color="#ff0000">org/snonux/conf</font><font color="#F3E651">.</font><font color="#ff0000">git</font>
+<font color="#ff0000">$ cd conf</font>
+<font color="#ff0000">$ git checkout 15a86f3 </font><i><font color="#ababab"># Last commit before ArgoCD migration</font></i>
+<font color="#ff0000">$ cd f3s/prometheus</font><font color="#F3E651">/</font>
</pre>
<br />
<span>**Current master branch** contains the ArgoCD-managed versions with:</span><br />
@@ -3633,6 +3666,7 @@ $ cd f3s/prometheus/
<li><span class='inlinecode'>/data/nfs/k3svolumes/prometheus/data</span> — Prometheus time-series database</li>
<li><span class='inlinecode'>/data/nfs/k3svolumes/grafana/data</span> — Grafana configuration, dashboards, and plugins</li>
<li><span class='inlinecode'>/data/nfs/k3svolumes/loki/data</span> — Loki log chunks and index</li>
+<li><span class='inlinecode'>/data/nfs/k3svolumes/tempo/data</span> — Tempo trace data and WAL</li>
</ul><br />
<span>Each path gets a corresponding <span class='inlinecode'>PersistentVolume</span> and <span class='inlinecode'>PersistentVolumeClaim</span> in Kubernetes, allowing pods to mount them as regular volumes. Because the underlying storage is ZFS with replication, we get snapshots and redundancy for free.</span><br />
<br />
@@ -3644,8 +3678,8 @@ $ cd f3s/prometheus/
by Lorenzo Bettini
http://www.lorenzobettini.it
http://www.gnu.org/software/src-highlite -->
-<pre>$ kubectl create namespace monitoring
-namespace/monitoring created
+<pre><font color="#ff0000">$ kubectl create namespace monitoring</font>
+<font color="#ff0000">namespace/monitoring created</font>
</pre>
<br />
<h2 style='display: inline' id='installing-prometheus-and-grafana'>Installing Prometheus and Grafana</h2><br />
@@ -3660,8 +3694,8 @@ namespace/monitoring created
by Lorenzo Bettini
http://www.lorenzobettini.it
http://www.gnu.org/software/src-highlite -->
-<pre>$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
-$ helm repo update
+<pre><font color="#ff0000">$ helm repo add prometheus-community https</font><font color="#F3E651">:</font><font color="#ff0000">//prometheus-community</font><font color="#F3E651">.</font><font color="#ff0000">github</font><font color="#F3E651">.</font><font color="#ff0000">io/helm-charts</font>
+<font color="#ff0000">$ helm repo update</font>
</pre>
<br />
<span>Create the directories on the NFS server for persistent storage:</span><br />
@@ -3670,8 +3704,8 @@ $ helm repo update
by Lorenzo Bettini
http://www.lorenzobettini.it
http://www.gnu.org/software/src-highlite -->
-<pre>[root@r0 ~]<i><font color="silver"># mkdir -p /data/nfs/k3svolumes/prometheus/data</font></i>
-[root@r0 ~]<i><font color="silver"># mkdir -p /data/nfs/k3svolumes/grafana/data</font></i>
+<pre><font color="#F3E651">[</font><font color="#ff0000">root@r0 </font><font color="#F3E651">~]</font><i><font color="#ababab"># mkdir -p /data/nfs/k3svolumes/prometheus/data</font></i>
+<font color="#F3E651">[</font><font color="#ff0000">root@r0 </font><font color="#F3E651">~]</font><i><font color="#ababab"># mkdir -p /data/nfs/k3svolumes/grafana/data</font></i>
</pre>
<br />
<h3 style='display: inline' id='deploying-with-the-justfile'>Deploying with the Justfile</h3><br />
@@ -3687,18 +3721,18 @@ http://www.gnu.org/software/src-highlite -->
by Lorenzo Bettini
http://www.lorenzobettini.it
http://www.gnu.org/software/src-highlite -->
-<pre>$ cd conf/f3s/prometheus
-$ just install
-kubectl apply -f persistent-volumes.yaml
-persistentvolume/prometheus-data-pv created
-persistentvolume/grafana-data-pv created
-persistentvolumeclaim/grafana-data-pvc created
-helm install prometheus prometheus-community/kube-prometheus-stack \
- --namespace monitoring -f persistence-values.yaml
-NAME: prometheus
-LAST DEPLOYED: ...
-NAMESPACE: monitoring
-STATUS: deployed
+<pre><font color="#ff0000">$ cd conf/f3s/prometheus</font>
+<font color="#ff0000">$ just install</font>
+<font color="#ff0000">kubectl apply -f persistent-volumes</font><font color="#F3E651">.</font><font color="#ff0000">yaml</font>
+<font color="#ff0000">persistentvolume/prometheus-data-pv created</font>
+<font color="#ff0000">persistentvolume/grafana-data-pv created</font>
+<font color="#ff0000">persistentvolumeclaim/grafana-data-pvc created</font>
+<font color="#ff0000">helm install prometheus prometheus-community/kube-prometheus-stack </font><font color="#F3E651">\</font>
+<font color="#ff0000"> --namespace monitoring -f persistence-values</font><font color="#F3E651">.</font><font color="#ff0000">yaml</font>
+<font color="#ff0000">NAME</font><font color="#F3E651">:</font><font color="#ff0000"> prometheus</font>
+<font color="#ff0000">LAST DEPLOYED</font><font color="#F3E651">:</font><font color="#ff0000"> </font><font color="#F3E651">...</font>
+<font color="#ff0000">NAMESPACE</font><font color="#F3E651">:</font><font color="#ff0000"> monitoring</font>
+<font color="#ff0000">STATUS</font><font color="#F3E651">:</font><font color="#ff0000"> deployed</font>
</pre>
<br />
<span>The <span class='inlinecode'>persistence-values.yaml</span> configures Prometheus and Grafana to use the NFS-backed persistent volumes I mentioned earlier, ensuring data survives pod restarts. It also enables scraping of etcd and kube-controller-manager metrics:</span><br />
@@ -3731,20 +3765,35 @@ kubeControllerManager:
insecureSkipVerify: true
</pre>
<br />
-<span>By default, k3s binds the controller-manager to localhost only, so the "Kubernetes / Controller Manager" dashboard in Grafana will show no data. To expose the metrics endpoint, add the following to <span class='inlinecode'>/etc/rancher/k3s/config.yaml</span> on each k3s server node:</span><br />
+<span>By default, k3s binds the controller-manager to localhost only and doesn&#39;t expose etcd metrics, so the "Kubernetes / Controller Manager" and "etcd" dashboards in Grafana will show no data. To fix both, add the following to <span class='inlinecode'>/etc/rancher/k3s/config.yaml</span> on each k3s server node:</span><br />
<br />
<!-- Generator: GNU source-highlight 3.1.9
by Lorenzo Bettini
http://www.lorenzobettini.it
http://www.gnu.org/software/src-highlite -->
-<pre>[root@r0 ~]<i><font color="silver"># cat &gt;&gt; /etc/rancher/k3s/config.yaml &lt;&lt; 'EOF'</font></i>
-kube-controller-manager-arg:
- - bind-address=<font color="#000000">0.0</font>.<font color="#000000">0.0</font>
-EOF
-[root@r0 ~]<i><font color="silver"># systemctl restart k3s</font></i>
+<pre><font color="#F3E651">[</font><font color="#ff0000">root@r0 </font><font color="#F3E651">~]</font><i><font color="#ababab"># cat &gt;&gt; /etc/rancher/k3s/config.yaml &lt;&lt; 'EOF'</font></i>
+<font color="#ff0000">kube-controller-manager-arg</font><font color="#F3E651">:</font>
+<font color="#ff0000"> - bind-address</font><font color="#F3E651">=</font><font color="#bb00ff">0.0</font><font color="#F3E651">.</font><font color="#bb00ff">0.0</font>
+<font color="#ff0000">etcd-expose-metrics</font><font color="#F3E651">:</font><font color="#ff0000"> </font><b><font color="#ffffff">true</font></b>
+<font color="#ff0000">EOF</font>
+<font color="#F3E651">[</font><font color="#ff0000">root@r0 </font><font color="#F3E651">~]</font><i><font color="#ababab"># systemctl restart k3s</font></i>
+</pre>
+<br />
+<span>Repeat for <span class='inlinecode'>r1</span> and <span class='inlinecode'>r2</span>. After restarting all nodes, the controller-manager metrics endpoint will be accessible and etcd metrics are available on port 2381. Prometheus can now scrape both.</span><br />
+<br />
+<span>Verify etcd metrics are exposed:</span><br />
+<br />
+<!-- Generator: GNU source-highlight 3.1.9
+by Lorenzo Bettini
+http://www.lorenzobettini.it
+http://www.gnu.org/software/src-highlite -->
+<pre><font color="#F3E651">[</font><font color="#ff0000">root@r0 </font><font color="#F3E651">~]</font><i><font color="#ababab"># curl -s http://127.0.0.1:2381/metrics | grep etcd_server_has_leader</font></i>
+<font color="#ff0000">etcd_server_has_leader </font><font color="#bb00ff">1</font>
</pre>
<br />
-<span>Repeat for <span class='inlinecode'>r1</span> and <span class='inlinecode'>r2</span>. After restarting all nodes, the controller-manager metrics endpoint will be accessible and Prometheus can scrape it.</span><br />
+<span>The full <span class='inlinecode'>persistence-values.yaml</span> and all other Prometheus configuration files are available on Codeberg:</span><br />
+<br />
+<a class='textlink' href='https://codeberg.org/snonux/conf/src/branch/master/f3s/prometheus'>codeberg.org/snonux/conf/f3s/prometheus</a><br />
<br />
<span>The persistent volume definitions bind to specific paths on the NFS share using <span class='inlinecode'>hostPath</span> volumes—the same pattern used for other services in Part 7:</span><br />
<br />
@@ -3760,9 +3809,9 @@ EOF
by Lorenzo Bettini
http://www.lorenzobettini.it
http://www.gnu.org/software/src-highlite -->
-<pre>$ kubectl get svc -n monitoring prometheus-kube-prometheus-prometheus
-NAME TYPE CLUSTER-IP PORT(S)
-prometheus-kube-prometheus-prometheus ClusterIP <font color="#000000">10.43</font>.<font color="#000000">152.163</font> <font color="#000000">9090</font>/TCP,<font color="#000000">8080</font>/TCP
+<pre><font color="#ff0000">$ kubectl get svc -n monitoring prometheus-kube-prometheus-prometheus</font>
+<font color="#ff0000">NAME TYPE CLUSTER-IP PORT</font><font color="#F3E651">(</font><font color="#ff0000">S</font><font color="#F3E651">)</font>
+<font color="#ff0000">prometheus-kube-prometheus-prometheus ClusterIP </font><font color="#bb00ff">10.43</font><font color="#F3E651">.</font><font color="#bb00ff">152.163</font><font color="#ff0000"> </font><font color="#bb00ff">9090</font><font color="#ff0000">/TCP</font><font color="#F3E651">,</font><font color="#bb00ff">8080</font><font color="#ff0000">/TCP</font>
</pre>
<br />
<span>Grafana connects to Prometheus using the internal service URL <span class='inlinecode'>http://prometheus-kube-prometheus-prometheus.monitoring.svc.cluster.local:9090</span>. The default Grafana credentials are <span class='inlinecode'>admin</span>/<span class='inlinecode'>prom-operator</span>, which should be changed immediately after first login.</span><br />
@@ -3771,6 +3820,8 @@ prometheus-kube-prometheus-prometheus ClusterIP <font color="#000000">10.43<
<br />
<a href='./f3s-kubernetes-with-freebsd-part-8/grafana-dashboard.png'><img alt='Grafana dashboard showing cluster metrics' title='Grafana dashboard showing cluster metrics' src='./f3s-kubernetes-with-freebsd-part-8/grafana-dashboard.png' /></a><br />
<br />
+<a href='./f3s-kubernetes-with-freebsd-part-8/grafana-etcd-dashboard.png'><img alt='Grafana etcd dashboard showing cluster health, RPC rate, disk sync duration, and peer round trip times' title='Grafana etcd dashboard showing cluster health, RPC rate, disk sync duration, and peer round trip times' src='./f3s-kubernetes-with-freebsd-part-8/grafana-etcd-dashboard.png' /></a><br />
+<br />
<h2 style='display: inline' id='installing-loki-and-alloy'>Installing Loki and Alloy</h2><br />
<br />
<span>While Prometheus handles metrics, Loki handles logs. It&#39;s designed to be cost-effective and easy to operate—it doesn&#39;t index the contents of logs, only the metadata (labels), making it very efficient for storage.</span><br />
@@ -3785,7 +3836,7 @@ prometheus-kube-prometheus-prometheus ClusterIP <font color="#000000">10.43<
by Lorenzo Bettini
http://www.lorenzobettini.it
http://www.gnu.org/software/src-highlite -->
-<pre>[root@r0 ~]<i><font color="silver"># mkdir -p /data/nfs/k3svolumes/loki/data</font></i>
+<pre><font color="#F3E651">[</font><font color="#ff0000">root@r0 </font><font color="#F3E651">~]</font><i><font color="#ababab"># mkdir -p /data/nfs/k3svolumes/loki/data</font></i>
</pre>
<br />
<h3 style='display: inline' id='deploying-loki-and-alloy'>Deploying Loki and Alloy</h3><br />
@@ -3800,24 +3851,24 @@ http://www.gnu.org/software/src-highlite -->
by Lorenzo Bettini
http://www.lorenzobettini.it
http://www.gnu.org/software/src-highlite -->
-<pre>$ cd conf/f3s/loki
-$ just install
-helm repo add grafana https://grafana.github.io/helm-charts || <b><u><font color="#000000">true</font></u></b>
-helm repo update
-kubectl apply -f persistent-volumes.yaml
-persistentvolume/loki-data-pv created
-persistentvolumeclaim/loki-data-pvc created
-helm install loki grafana/loki --namespace monitoring -f values.yaml
-NAME: loki
-LAST DEPLOYED: ...
-NAMESPACE: monitoring
-STATUS: deployed
-...
-helm install alloy grafana/alloy --namespace monitoring -f alloy-values.yaml
-NAME: alloy
-LAST DEPLOYED: ...
-NAMESPACE: monitoring
-STATUS: deployed
+<pre><font color="#ff0000">$ cd conf/f3s/loki</font>
+<font color="#ff0000">$ just install</font>
+<font color="#ff0000">helm repo add grafana https</font><font color="#F3E651">:</font><font color="#ff0000">//grafana</font><font color="#F3E651">.</font><font color="#ff0000">github</font><font color="#F3E651">.</font><font color="#ff0000">io/helm-charts </font><font color="#F3E651">||</font><font color="#ff0000"> </font><b><font color="#ffffff">true</font></b>
+<font color="#ff0000">helm repo update</font>
+<font color="#ff0000">kubectl apply -f persistent-volumes</font><font color="#F3E651">.</font><font color="#ff0000">yaml</font>
+<font color="#ff0000">persistentvolume/loki-data-pv created</font>
+<font color="#ff0000">persistentvolumeclaim/loki-data-pvc created</font>
+<font color="#ff0000">helm install loki grafana/loki --namespace monitoring -f values</font><font color="#F3E651">.</font><font color="#ff0000">yaml</font>
+<font color="#ff0000">NAME</font><font color="#F3E651">:</font><font color="#ff0000"> loki</font>
+<font color="#ff0000">LAST DEPLOYED</font><font color="#F3E651">:</font><font color="#ff0000"> </font><font color="#F3E651">...</font>
+<font color="#ff0000">NAMESPACE</font><font color="#F3E651">:</font><font color="#ff0000"> monitoring</font>
+<font color="#ff0000">STATUS</font><font color="#F3E651">:</font><font color="#ff0000"> deployed</font>
+<font color="#F3E651">...</font>
+<font color="#ff0000">helm install alloy grafana/alloy --namespace monitoring -f alloy-values</font><font color="#F3E651">.</font><font color="#ff0000">yaml</font>
+<font color="#ff0000">NAME</font><font color="#F3E651">:</font><font color="#ff0000"> alloy</font>
+<font color="#ff0000">LAST DEPLOYED</font><font color="#F3E651">:</font><font color="#ff0000"> </font><font color="#F3E651">...</font>
+<font color="#ff0000">NAMESPACE</font><font color="#F3E651">:</font><font color="#ff0000"> monitoring</font>
+<font color="#ff0000">STATUS</font><font color="#F3E651">:</font><font color="#ff0000"> deployed</font>
</pre>
<br />
<span>Loki runs in single-binary mode with a single replica (<span class='inlinecode'>loki-0</span>), which is appropriate for a home lab cluster. This means there&#39;s only one Loki pod running at any time. If the node hosting Loki fails, Kubernetes will automatically reschedule the pod to another worker node—but there will be a brief downtime (typically under a minute) while this happens. For my home lab use case, this is perfectly acceptable.</span><br />
@@ -3832,44 +3883,44 @@ STATUS: deployed
by Lorenzo Bettini
http://www.lorenzobettini.it
http://www.gnu.org/software/src-highlite -->
-<pre>discovery.kubernetes <font color="#808080">"pods"</font> {
- role = <font color="#808080">"pod"</font>
-}
+<pre><font color="#ff0000">discovery</font><font color="#F3E651">.</font><font color="#ff0000">kubernetes </font><font color="#bb00ff">"pods"</font><font color="#ff0000"> {</font>
+<font color="#ff0000"> role </font><font color="#F3E651">=</font><font color="#ff0000"> </font><font color="#bb00ff">"pod"</font>
+<font color="#ff0000">}</font>
-discovery.relabel <font color="#808080">"pods"</font> {
- targets = discovery.kubernetes.pods.targets
+<font color="#ff0000">discovery</font><font color="#F3E651">.</font><font color="#ff0000">relabel </font><font color="#bb00ff">"pods"</font><font color="#ff0000"> {</font>
+<font color="#ff0000"> targets </font><font color="#F3E651">=</font><font color="#ff0000"> discovery</font><font color="#F3E651">.</font><font color="#ff0000">kubernetes</font><font color="#F3E651">.</font><font color="#ff0000">pods</font><font color="#F3E651">.</font><font color="#ff0000">targets</font>
- rule {
- source_labels = [<font color="#808080">"__meta_kubernetes_namespace"</font>]
- target_label = <font color="#808080">"namespace"</font>
- }
+<font color="#ff0000"> rule {</font>
+<font color="#ff0000"> source_labels </font><font color="#F3E651">=</font><font color="#ff0000"> </font><font color="#F3E651">[</font><font color="#bb00ff">"__meta_kubernetes_namespace"</font><font color="#F3E651">]</font>
+<font color="#ff0000"> target_label </font><font color="#F3E651">=</font><font color="#ff0000"> </font><font color="#bb00ff">"namespace"</font>
+<font color="#ff0000"> }</font>
- rule {
- source_labels = [<font color="#808080">"__meta_kubernetes_pod_name"</font>]
- target_label = <font color="#808080">"pod"</font>
- }
+<font color="#ff0000"> rule {</font>
+<font color="#ff0000"> source_labels </font><font color="#F3E651">=</font><font color="#ff0000"> </font><font color="#F3E651">[</font><font color="#bb00ff">"__meta_kubernetes_pod_name"</font><font color="#F3E651">]</font>
+<font color="#ff0000"> target_label </font><font color="#F3E651">=</font><font color="#ff0000"> </font><font color="#bb00ff">"pod"</font>
+<font color="#ff0000"> }</font>
- rule {
- source_labels = [<font color="#808080">"__meta_kubernetes_pod_container_name"</font>]
- target_label = <font color="#808080">"container"</font>
- }
+<font color="#ff0000"> rule {</font>
+<font color="#ff0000"> source_labels </font><font color="#F3E651">=</font><font color="#ff0000"> </font><font color="#F3E651">[</font><font color="#bb00ff">"__meta_kubernetes_pod_container_name"</font><font color="#F3E651">]</font>
+<font color="#ff0000"> target_label </font><font color="#F3E651">=</font><font color="#ff0000"> </font><font color="#bb00ff">"container"</font>
+<font color="#ff0000"> }</font>
- rule {
- source_labels = [<font color="#808080">"__meta_kubernetes_pod_label_app"</font>]
- target_label = <font color="#808080">"app"</font>
- }
-}
+<font color="#ff0000"> rule {</font>
+<font color="#ff0000"> source_labels </font><font color="#F3E651">=</font><font color="#ff0000"> </font><font color="#F3E651">[</font><font color="#bb00ff">"__meta_kubernetes_pod_label_app"</font><font color="#F3E651">]</font>
+<font color="#ff0000"> target_label </font><font color="#F3E651">=</font><font color="#ff0000"> </font><font color="#bb00ff">"app"</font>
+<font color="#ff0000"> }</font>
+<font color="#ff0000">}</font>
-loki.<b><u><font color="#000000">source</font></u></b>.kubernetes <font color="#808080">"pods"</font> {
- targets = discovery.relabel.pods.output
- forward_to = [loki.write.default.receiver]
-}
+<font color="#ff0000">loki</font><font color="#F3E651">.</font><b><font color="#ffffff">source</font></b><font color="#F3E651">.</font><font color="#ff0000">kubernetes </font><font color="#bb00ff">"pods"</font><font color="#ff0000"> {</font>
+<font color="#ff0000"> targets </font><font color="#F3E651">=</font><font color="#ff0000"> discovery</font><font color="#F3E651">.</font><font color="#ff0000">relabel</font><font color="#F3E651">.</font><font color="#ff0000">pods</font><font color="#F3E651">.</font><font color="#ff0000">output</font>
+<font color="#ff0000"> forward_to </font><font color="#F3E651">=</font><font color="#ff0000"> </font><font color="#F3E651">[</font><font color="#ff0000">loki</font><font color="#F3E651">.</font><font color="#ff0000">write</font><font color="#F3E651">.</font><font color="#ff0000">default</font><font color="#F3E651">.</font><font color="#ff0000">receiver</font><font color="#F3E651">]</font>
+<font color="#ff0000">}</font>
-loki.write <font color="#808080">"default"</font> {
- endpoint {
- url = <font color="#808080">"http://loki.monitoring.svc.cluster.local:3100/loki/api/v1/push"</font>
- }
-}
+<font color="#ff0000">loki</font><font color="#F3E651">.</font><font color="#ff0000">write </font><font color="#bb00ff">"default"</font><font color="#ff0000"> {</font>
+<font color="#ff0000"> endpoint {</font>
+<font color="#ff0000"> url </font><font color="#F3E651">=</font><font color="#ff0000"> </font><font color="#bb00ff">"http://loki.monitoring.svc.cluster.local:3100/loki/api/v1/push"</font>
+<font color="#ff0000"> }</font>
+<font color="#ff0000">}</font>
</pre>
<br />
<span>This configuration automatically labels each log line with the namespace, pod name, container name, and app label, making it easy to filter logs in Grafana.</span><br />
@@ -3882,9 +3933,9 @@ loki.write <font color="#808080">"default"</font> {
by Lorenzo Bettini
http://www.lorenzobettini.it
http://www.gnu.org/software/src-highlite -->
-<pre>$ kubectl get svc -n monitoring loki
-NAME TYPE CLUSTER-IP PORT(S)
-loki ClusterIP <font color="#000000">10.43</font>.<font color="#000000">64.60</font> <font color="#000000">3100</font>/TCP,<font color="#000000">9095</font>/TCP
+<pre><font color="#ff0000">$ kubectl get svc -n monitoring loki</font>
+<font color="#ff0000">NAME TYPE CLUSTER-IP PORT</font><font color="#F3E651">(</font><font color="#ff0000">S</font><font color="#F3E651">)</font>
+<font color="#ff0000">loki ClusterIP </font><font color="#bb00ff">10.43</font><font color="#F3E651">.</font><font color="#bb00ff">64.60</font><font color="#ff0000"> </font><font color="#bb00ff">3100</font><font color="#ff0000">/TCP</font><font color="#F3E651">,</font><font color="#bb00ff">9095</font><font color="#ff0000">/TCP</font>
</pre>
<br />
<span>To add Loki as a data source in Grafana:</span><br />
@@ -3908,40 +3959,44 @@ loki ClusterIP <font color="#000000">10.43</font>.<font color="#000000">64.6
by Lorenzo Bettini
http://www.lorenzobettini.it
http://www.gnu.org/software/src-highlite -->
-<pre>$ kubectl get pods -n monitoring
-NAME READY STATUS RESTARTS AGE
-alertmanager-prometheus-kube-prometheus-alertmanager-<font color="#000000">0</font> <font color="#000000">2</font>/<font color="#000000">2</font> Running <font color="#000000">0</font> 42d
-alloy-g5fgj <font color="#000000">2</font>/<font color="#000000">2</font> Running <font color="#000000">0</font> 29m
-alloy-nfw8w <font color="#000000">2</font>/<font color="#000000">2</font> Running <font color="#000000">0</font> 29m
-alloy-tg9vj <font color="#000000">2</font>/<font color="#000000">2</font> Running <font color="#000000">0</font> 29m
-loki-<font color="#000000">0</font> <font color="#000000">2</font>/<font color="#000000">2</font> Running <font color="#000000">0</font> 25m
-prometheus-grafana-868f9dc7cf-lg2vl <font color="#000000">3</font>/<font color="#000000">3</font> Running <font color="#000000">0</font> 42d
-prometheus-kube-prometheus-operator-8d7bbc48c-p4sf4 <font color="#000000">1</font>/<font color="#000000">1</font> Running <font color="#000000">0</font> 42d
-prometheus-kube-state-metrics-7c5fb9d798-hh2fx <font color="#000000">1</font>/<font color="#000000">1</font> Running <font color="#000000">0</font> 42d
-prometheus-prometheus-kube-prometheus-prometheus-<font color="#000000">0</font> <font color="#000000">2</font>/<font color="#000000">2</font> Running <font color="#000000">0</font> 42d
-prometheus-prometheus-node-exporter-2nsg9 <font color="#000000">1</font>/<font color="#000000">1</font> Running <font color="#000000">0</font> 42d
-prometheus-prometheus-node-exporter-mqr<font color="#000000">25</font> <font color="#000000">1</font>/<font color="#000000">1</font> Running <font color="#000000">0</font> 42d
-prometheus-prometheus-node-exporter-wp4ds <font color="#000000">1</font>/<font color="#000000">1</font> Running <font color="#000000">0</font> 42d
+<pre><font color="#ff0000">$ kubectl get pods -n monitoring</font>
+<font color="#ff0000">NAME READY STATUS RESTARTS AGE</font>
+<font color="#ff0000">alertmanager-prometheus-kube-prometheus-alertmanager-</font><font color="#bb00ff">0</font><font color="#ff0000"> </font><font color="#bb00ff">2</font><font color="#F3E651">/</font><font color="#bb00ff">2</font><font color="#ff0000"> Running </font><font color="#bb00ff">0</font><font color="#ff0000"> 42d</font>
+<font color="#ff0000">alloy-g5fgj </font><font color="#bb00ff">2</font><font color="#F3E651">/</font><font color="#bb00ff">2</font><font color="#ff0000"> Running </font><font color="#bb00ff">0</font><font color="#ff0000"> 29m</font>
+<font color="#ff0000">alloy-nfw8w </font><font color="#bb00ff">2</font><font color="#F3E651">/</font><font color="#bb00ff">2</font><font color="#ff0000"> Running </font><font color="#bb00ff">0</font><font color="#ff0000"> 29m</font>
+<font color="#ff0000">alloy-tg9vj </font><font color="#bb00ff">2</font><font color="#F3E651">/</font><font color="#bb00ff">2</font><font color="#ff0000"> Running </font><font color="#bb00ff">0</font><font color="#ff0000"> 29m</font>
+<font color="#ff0000">loki-</font><font color="#bb00ff">0</font><font color="#ff0000"> </font><font color="#bb00ff">2</font><font color="#F3E651">/</font><font color="#bb00ff">2</font><font color="#ff0000"> Running </font><font color="#bb00ff">0</font><font color="#ff0000"> 25m</font>
+<font color="#ff0000">prometheus-grafana-868f9dc7cf-lg2vl </font><font color="#bb00ff">3</font><font color="#F3E651">/</font><font color="#bb00ff">3</font><font color="#ff0000"> Running </font><font color="#bb00ff">0</font><font color="#ff0000"> 42d</font>
+<font color="#ff0000">prometheus-kube-prometheus-operator-8d7bbc48c-p4sf4 </font><font color="#bb00ff">1</font><font color="#F3E651">/</font><font color="#bb00ff">1</font><font color="#ff0000"> Running </font><font color="#bb00ff">0</font><font color="#ff0000"> 42d</font>
+<font color="#ff0000">prometheus-kube-state-metrics-7c5fb9d798-hh2fx </font><font color="#bb00ff">1</font><font color="#F3E651">/</font><font color="#bb00ff">1</font><font color="#ff0000"> Running </font><font color="#bb00ff">0</font><font color="#ff0000"> 42d</font>
+<font color="#ff0000">prometheus-prometheus-kube-prometheus-prometheus-</font><font color="#bb00ff">0</font><font color="#ff0000"> </font><font color="#bb00ff">2</font><font color="#F3E651">/</font><font color="#bb00ff">2</font><font color="#ff0000"> Running </font><font color="#bb00ff">0</font><font color="#ff0000"> 42d</font>
+<font color="#ff0000">prometheus-prometheus-node-exporter-2nsg9 </font><font color="#bb00ff">1</font><font color="#F3E651">/</font><font color="#bb00ff">1</font><font color="#ff0000"> Running </font><font color="#bb00ff">0</font><font color="#ff0000"> 42d</font>
+<font color="#ff0000">prometheus-prometheus-node-exporter-mqr</font><font color="#bb00ff">25</font><font color="#ff0000"> </font><font color="#bb00ff">1</font><font color="#F3E651">/</font><font color="#bb00ff">1</font><font color="#ff0000"> Running </font><font color="#bb00ff">0</font><font color="#ff0000"> 42d</font>
+<font color="#ff0000">prometheus-prometheus-node-exporter-wp4ds </font><font color="#bb00ff">1</font><font color="#F3E651">/</font><font color="#bb00ff">1</font><font color="#ff0000"> Running </font><font color="#bb00ff">0</font><font color="#ff0000"> 42d</font>
+<font color="#ff0000">tempo-</font><font color="#bb00ff">0</font><font color="#ff0000"> </font><font color="#bb00ff">1</font><font color="#F3E651">/</font><font color="#bb00ff">1</font><font color="#ff0000"> Running </font><font color="#bb00ff">0</font><font color="#ff0000"> 1d</font>
</pre>
<br />
+<span>Note: Tempo (<span class='inlinecode'>tempo-0</span>) is deployed later in this post in the "Distributed Tracing with Grafana Tempo" section. It is included in the pod listing here for completeness.</span><br />
+<br />
<span>And the services:</span><br />
<br />
<!-- Generator: GNU source-highlight 3.1.9
by Lorenzo Bettini
http://www.lorenzobettini.it
http://www.gnu.org/software/src-highlite -->
-<pre>$ kubectl get svc -n monitoring
-NAME TYPE CLUSTER-IP PORT(S)
-alertmanager-operated ClusterIP None <font color="#000000">9093</font>/TCP,<font color="#000000">9094</font>/TCP
-alloy ClusterIP <font color="#000000">10.43</font>.<font color="#000000">74.14</font> <font color="#000000">12345</font>/TCP
-loki ClusterIP <font color="#000000">10.43</font>.<font color="#000000">64.60</font> <font color="#000000">3100</font>/TCP,<font color="#000000">9095</font>/TCP
-loki-headless ClusterIP None <font color="#000000">3100</font>/TCP
-prometheus-grafana ClusterIP <font color="#000000">10.43</font>.<font color="#000000">46.82</font> <font color="#000000">80</font>/TCP
-prometheus-kube-prometheus-alertmanager ClusterIP <font color="#000000">10.43</font>.<font color="#000000">208.43</font> <font color="#000000">9093</font>/TCP,<font color="#000000">8080</font>/TCP
-prometheus-kube-prometheus-operator ClusterIP <font color="#000000">10.43</font>.<font color="#000000">246.121</font> <font color="#000000">443</font>/TCP
-prometheus-kube-prometheus-prometheus ClusterIP <font color="#000000">10.43</font>.<font color="#000000">152.163</font> <font color="#000000">9090</font>/TCP,<font color="#000000">8080</font>/TCP
-prometheus-kube-state-metrics ClusterIP <font color="#000000">10.43</font>.<font color="#000000">64.26</font> <font color="#000000">8080</font>/TCP
-prometheus-prometheus-node-exporter ClusterIP <font color="#000000">10.43</font>.<font color="#000000">127.242</font> <font color="#000000">9100</font>/TCP
+<pre><font color="#ff0000">$ kubectl get svc -n monitoring</font>
+<font color="#ff0000">NAME TYPE CLUSTER-IP PORT</font><font color="#F3E651">(</font><font color="#ff0000">S</font><font color="#F3E651">)</font>
+<font color="#ff0000">alertmanager-operated ClusterIP None </font><font color="#bb00ff">9093</font><font color="#ff0000">/TCP</font><font color="#F3E651">,</font><font color="#bb00ff">9094</font><font color="#ff0000">/TCP</font>
+<font color="#ff0000">alloy ClusterIP </font><font color="#bb00ff">10.43</font><font color="#F3E651">.</font><font color="#bb00ff">74.14</font><font color="#ff0000"> </font><font color="#bb00ff">12345</font><font color="#ff0000">/TCP</font>
+<font color="#ff0000">loki ClusterIP </font><font color="#bb00ff">10.43</font><font color="#F3E651">.</font><font color="#bb00ff">64.60</font><font color="#ff0000"> </font><font color="#bb00ff">3100</font><font color="#ff0000">/TCP</font><font color="#F3E651">,</font><font color="#bb00ff">9095</font><font color="#ff0000">/TCP</font>
+<font color="#ff0000">loki-headless ClusterIP None </font><font color="#bb00ff">3100</font><font color="#ff0000">/TCP</font>
+<font color="#ff0000">prometheus-grafana ClusterIP </font><font color="#bb00ff">10.43</font><font color="#F3E651">.</font><font color="#bb00ff">46.82</font><font color="#ff0000"> </font><font color="#bb00ff">80</font><font color="#ff0000">/TCP</font>
+<font color="#ff0000">prometheus-kube-prometheus-alertmanager ClusterIP </font><font color="#bb00ff">10.43</font><font color="#F3E651">.</font><font color="#bb00ff">208.43</font><font color="#ff0000"> </font><font color="#bb00ff">9093</font><font color="#ff0000">/TCP</font><font color="#F3E651">,</font><font color="#bb00ff">8080</font><font color="#ff0000">/TCP</font>
+<font color="#ff0000">prometheus-kube-prometheus-operator ClusterIP </font><font color="#bb00ff">10.43</font><font color="#F3E651">.</font><font color="#bb00ff">246.121</font><font color="#ff0000"> </font><font color="#bb00ff">443</font><font color="#ff0000">/TCP</font>
+<font color="#ff0000">prometheus-kube-prometheus-prometheus ClusterIP </font><font color="#bb00ff">10.43</font><font color="#F3E651">.</font><font color="#bb00ff">152.163</font><font color="#ff0000"> </font><font color="#bb00ff">9090</font><font color="#ff0000">/TCP</font><font color="#F3E651">,</font><font color="#bb00ff">8080</font><font color="#ff0000">/TCP</font>
+<font color="#ff0000">prometheus-kube-state-metrics ClusterIP </font><font color="#bb00ff">10.43</font><font color="#F3E651">.</font><font color="#bb00ff">64.26</font><font color="#ff0000"> </font><font color="#bb00ff">8080</font><font color="#ff0000">/TCP</font>
+<font color="#ff0000">prometheus-prometheus-node-exporter ClusterIP </font><font color="#bb00ff">10.43</font><font color="#F3E651">.</font><font color="#bb00ff">127.242</font><font color="#ff0000"> </font><font color="#bb00ff">9100</font><font color="#ff0000">/TCP</font>
+<font color="#ff0000">tempo ClusterIP </font><font color="#bb00ff">10.43</font><font color="#F3E651">.</font><font color="#bb00ff">91.44</font><font color="#ff0000"> </font><font color="#bb00ff">3200</font><font color="#ff0000">/TCP</font><font color="#F3E651">,</font><font color="#bb00ff">4317</font><font color="#ff0000">/TCP</font><font color="#F3E651">,</font><font color="#bb00ff">4318</font><font color="#ff0000">/TCP</font>
</pre>
<br />
<span>Let me break down what each pod does:</span><br />
@@ -3970,6 +4025,9 @@ prometheus-prometheus-node-exporter ClusterIP <font color="#000000">10.4
<ul>
<li><span class='inlinecode'>prometheus-prometheus-node-exporter-...</span>: three Node Exporter pods running as a DaemonSet, one on each node. They expose hardware and OS-level metrics: CPU usage, memory, disk I/O, filesystem usage, network statistics, and more. These feed the "Node Exporter" dashboards in Grafana.</li>
</ul><br />
+<ul>
+<li><span class='inlinecode'>tempo-0</span>: the Grafana Tempo instance for distributed tracing. It receives trace data from Alloy via OTLP (OpenTelemetry Protocol), stores traces on the NFS-backed persistent volume, and serves queries to Grafana. Tempo is covered in detail in the "Distributed Tracing with Grafana Tempo" section later in this post.</li>
+</ul><br />
<h2 style='display: inline' id='using-the-observability-stack'>Using the observability stack</h2><br />
<br />
<h3 style='display: inline' id='viewing-metrics-in-grafana'>Viewing metrics in Grafana</h3><br />
@@ -4015,7 +4073,7 @@ prometheus-prometheus-node-exporter ClusterIP <font color="#000000">10.4
by Lorenzo Bettini
http://www.lorenzobettini.it
http://www.gnu.org/software/src-highlite -->
-<pre>paul@f0:~ % doas pkg install -y node_exporter
+<pre><font color="#ff0000">paul@f0</font><font color="#F3E651">:~</font><font color="#ff0000"> </font><font color="#F3E651">%</font><font color="#ff0000"> doas pkg install -y node_exporter</font>
</pre>
<br />
<span>Enable the service to start at boot:</span><br />
@@ -4024,8 +4082,8 @@ http://www.gnu.org/software/src-highlite -->
by Lorenzo Bettini
http://www.lorenzobettini.it
http://www.gnu.org/software/src-highlite -->
-<pre>paul@f0:~ % doas sysrc node_exporter_enable=YES
-node_exporter_enable: -&gt; YES
+<pre><font color="#ff0000">paul@f0</font><font color="#F3E651">:~</font><font color="#ff0000"> </font><font color="#F3E651">%</font><font color="#ff0000"> doas sysrc </font><font color="#ff0000">node_exporter_enable</font><font color="#F3E651">=</font><font color="#ff0000">YES</font>
+<font color="#ff0000">node_exporter_enable</font><font color="#F3E651">:</font><font color="#ff0000"> -</font><font color="#F3E651">&gt;</font><font color="#ff0000"> YES</font>
</pre>
<br />
<span>Configure node_exporter to listen on the WireGuard interface. This ensures metrics are only accessible through the secure tunnel, not the public network. Replace the IP with the host&#39;s WireGuard address:</span><br />
@@ -4034,8 +4092,8 @@ node_exporter_enable: -&gt; YES
by Lorenzo Bettini
http://www.lorenzobettini.it
http://www.gnu.org/software/src-highlite -->
-<pre>paul@f0:~ % doas sysrc node_exporter_args=<font color="#808080">'--web.listen-address=192.168.2.130:9100'</font>
-node_exporter_args: -&gt; --web.listen-address=<font color="#000000">192.168</font>.<font color="#000000">2.130</font>:<font color="#000000">9100</font>
+<pre><font color="#ff0000">paul@f0</font><font color="#F3E651">:~</font><font color="#ff0000"> </font><font color="#F3E651">%</font><font color="#ff0000"> doas sysrc </font><font color="#ff0000">node_exporter_args</font><font color="#F3E651">=</font><font color="#bb00ff">'--web.listen-address=192.168.2.130:9100'</font>
+<font color="#ff0000">node_exporter_args</font><font color="#F3E651">:</font><font color="#ff0000"> -</font><font color="#F3E651">&gt;</font><font color="#ff0000"> --web</font><font color="#F3E651">.</font><font color="#ff0000">listen-address</font><font color="#F3E651">=</font><font color="#bb00ff">192.168</font><font color="#F3E651">.</font><font color="#bb00ff">2.130</font><font color="#F3E651">:</font><font color="#bb00ff">9100</font>
</pre>
<br />
<span>Start the service:</span><br />
@@ -4044,8 +4102,8 @@ node_exporter_args: -&gt; --web.listen-address=<font color="#000000">192.168</f
by Lorenzo Bettini
http://www.lorenzobettini.it
http://www.gnu.org/software/src-highlite -->
-<pre>paul@f0:~ % doas service node_exporter start
-Starting node_exporter.
+<pre><font color="#ff0000">paul@f0</font><font color="#F3E651">:~</font><font color="#ff0000"> </font><font color="#F3E651">%</font><font color="#ff0000"> doas service node_exporter start</font>
+<font color="#ff0000">Starting node_exporter</font><font color="#F3E651">.</font>
</pre>
<br />
<span>Verify it&#39;s running:</span><br />
@@ -4054,10 +4112,10 @@ Starting node_exporter.
by Lorenzo Bettini
http://www.lorenzobettini.it
http://www.gnu.org/software/src-highlite -->
-<pre>paul@f0:~ % curl -s http://<font color="#000000">192.168</font>.<font color="#000000">2.130</font>:<font color="#000000">9100</font>/metrics | head -<font color="#000000">3</font>
-<i><font color="silver"># HELP go_gc_duration_seconds A summary of the wall-time pause...</font></i>
-<i><font color="silver"># TYPE go_gc_duration_seconds summary</font></i>
-go_gc_duration_seconds{quantile=<font color="#808080">"0"</font>} <font color="#000000">0</font>
+<pre><font color="#ff0000">paul@f0</font><font color="#F3E651">:~</font><font color="#ff0000"> </font><font color="#F3E651">%</font><font color="#ff0000"> curl -s http</font><font color="#F3E651">://</font><font color="#bb00ff">192.168</font><font color="#F3E651">.</font><font color="#bb00ff">2.130</font><font color="#F3E651">:</font><font color="#bb00ff">9100</font><font color="#ff0000">/metrics </font><font color="#F3E651">|</font><font color="#ff0000"> head -</font><font color="#bb00ff">3</font>
+<i><font color="#ababab"># HELP go_gc_duration_seconds A summary of the wall-time pause...</font></i>
+<i><font color="#ababab"># TYPE go_gc_duration_seconds summary</font></i>
+<font color="#ff0000">go_gc_duration_seconds{</font><font color="#ff0000">quantile</font><font color="#F3E651">=</font><font color="#bb00ff">"0"</font><font color="#ff0000">} </font><font color="#bb00ff">0</font>
</pre>
<br />
<span>Repeat for the other FreeBSD hosts (<span class='inlinecode'>f1</span>, <span class='inlinecode'>f2</span>) with their respective WireGuard IPs.</span><br />
@@ -4085,9 +4143,9 @@ go_gc_duration_seconds{quantile=<font color="#808080">"0"</font>} <font color="#
by Lorenzo Bettini
http://www.lorenzobettini.it
http://www.gnu.org/software/src-highlite -->
-<pre>$ kubectl create secret generic additional-scrape-configs \
- --from-file=additional-scrape-configs.yaml \
- -n monitoring
+<pre><font color="#ff0000">$ kubectl create secret generic additional-scrape-configs </font><font color="#F3E651">\</font>
+<font color="#ff0000"> --from-file</font><font color="#F3E651">=</font><font color="#ff0000">additional-scrape-configs</font><font color="#F3E651">.</font><font color="#ff0000">yaml </font><font color="#F3E651">\</font>
+<font color="#ff0000"> -n monitoring</font>
</pre>
<br />
<span>Update <span class='inlinecode'>persistence-values.yaml</span> to reference the secret:</span><br />
@@ -4107,7 +4165,7 @@ prometheus:
by Lorenzo Bettini
http://www.lorenzobettini.it
http://www.gnu.org/software/src-highlite -->
-<pre>$ just upgrade
+<pre><font color="#ff0000">$ just upgrade</font>
</pre>
<br />
<span>After a minute or so, the FreeBSD hosts appear in the Prometheus targets and in the Node Exporter dashboards in Grafana.</span><br />
@@ -4155,7 +4213,313 @@ spec:
<br />
<span>Unlike memory metrics, disk I/O metrics (<span class='inlinecode'>node_disk_read_bytes_total</span>, <span class='inlinecode'>node_disk_written_bytes_total</span>, etc.) are not available on FreeBSD. The Linux diskstats collector that provides these metrics doesn&#39;t have a FreeBSD equivalent in the node_exporter.</span><br />
<br />
-<span>The disk I/O panels in the Node Exporter dashboards will show "No data" for FreeBSD hosts. FreeBSD does expose ZFS-specific metrics (<span class='inlinecode'>node_zfs_arcstats_*</span>) for ARC cache performance, and per-dataset I/O stats are available via <span class='inlinecode'>sysctl kstat.zfs</span>, but mapping these to the Linux-style metrics the dashboards expect is non-trivial. Creating custom ZFS-specific dashboards is left as an exercise for another day.</span><br />
+<span>The disk I/O panels in the Node Exporter dashboards will show "No data" for FreeBSD hosts. FreeBSD does expose ZFS-specific metrics (<span class='inlinecode'>node_zfs_arcstats_*</span>) for ARC cache performance, and per-dataset I/O stats are available via <span class='inlinecode'>sysctl kstat.zfs</span>, but mapping these to the Linux-style metrics the dashboards expect is non-trivial. To address this, I created custom ZFS-specific dashboards, covered in the next section.</span><br />
+<br />
+<h2 style='display: inline' id='zfs-monitoring-for-freebsd-servers'>ZFS Monitoring for FreeBSD Servers</h2><br />
+<br />
+<span>The FreeBSD servers (f0, f1, f2) that provide NFS storage to the k3s cluster have ZFS filesystems. Monitoring ZFS performance is crucial for understanding storage performance and cache efficiency.</span><br />
+<br />
+<h3 style='display: inline' id='node-exporter-zfs-collector'>Node Exporter ZFS Collector</h3><br />
+<br />
+<span>The node_exporter running on each FreeBSD server (v1.9.1) includes a built-in ZFS collector that exposes metrics via sysctls. The ZFS collector is enabled by default and provides:</span><br />
+<br />
+<ul>
+<li>ARC (Adaptive Replacement Cache) statistics</li>
+<li>Cache hit/miss rates</li>
+<li>Memory usage and allocation</li>
+<li>MRU/MFU cache breakdown</li>
+<li>Data vs metadata distribution</li>
+</ul><br />
+<h3 style='display: inline' id='verifying-zfs-metrics'>Verifying ZFS Metrics</h3><br />
+<br />
+<span>On any FreeBSD server, check that ZFS metrics are being exposed:</span><br />
+<br />
+<pre>
+paul@f0:~ % curl -s http://localhost:9100/metrics | grep node_zfs_arcstats | wc -l
+ 69
+</pre>
+<br />
+<span>The metrics are automatically scraped by Prometheus through the existing static configuration in additional-scrape-configs.yaml which targets all FreeBSD servers on port 9100 with the os: freebsd label.</span><br />
+<br />
+<h3 style='display: inline' id='zfs-recording-rules'>ZFS Recording Rules</h3><br />
+<br />
+<span>Created recording rules for easier dashboard consumption in zfs-recording-rules.yaml:</span><br />
+<br />
+<pre>
+apiVersion: monitoring.coreos.com/v1
+kind: PrometheusRule
+metadata:
+ name: freebsd-zfs-rules
+ namespace: monitoring
+ labels:
+ release: prometheus
+spec:
+ groups:
+ - name: freebsd-zfs-arc
+ interval: 30s
+ rules:
+ - record: node_zfs_arc_hit_rate_percent
+ expr: |
+ 100 * (
+ rate(node_zfs_arcstats_hits_total{os="freebsd"}[5m]) /
+ (rate(node_zfs_arcstats_hits_total{os="freebsd"}[5m]) +
+ rate(node_zfs_arcstats_misses_total{os="freebsd"}[5m]))
+ )
+ labels:
+ os: freebsd
+ - record: node_zfs_arc_memory_usage_percent
+ expr: |
+ 100 * (
+ node_zfs_arcstats_size_bytes{os="freebsd"} /
+ node_zfs_arcstats_c_max_bytes{os="freebsd"}
+ )
+ labels:
+ os: freebsd
+ # Additional rules for metadata %, target %, MRU/MFU %, etc.
+</pre>
+<br />
+<span>These recording rules calculate:</span><br />
+<br />
+<ul>
+<li>ARC hit rate percentage</li>
+<li>ARC memory usage percentage (current vs maximum)</li>
+<li>ARC target percentage (target vs maximum)</li>
+<li>Metadata vs data percentages</li>
+<li>MRU vs MFU cache percentages</li>
+<li>Demand data and metadata hit rates</li>
+</ul><br />
+<h3 style='display: inline' id='grafana-dashboards'>Grafana Dashboards</h3><br />
+<br />
+<span>Created two comprehensive ZFS monitoring dashboards (zfs-dashboards.yaml):</span><br />
+<br />
+<span>**Dashboard 1: FreeBSD ZFS (per-host detailed view)**</span><br />
+<br />
+<span>Includes variables to select:</span><br />
+<br />
+<ul>
+<li>FreeBSD server (f0, f1, or f2)</li>
+<li>ZFS pool (zdata, zroot, or all)</li>
+</ul><br />
+<span>Pool Overview Row:</span><br />
+<br />
+<ul>
+<li>Pool Capacity gauge (with thresholds: green &lt;70%, yellow &lt;85%, red &gt;85%)</li>
+<li>Pool Health status (ONLINE/DEGRADED/FAULTED with color coding)</li>
+<li>Total Pool Size stat</li>
+<li>Free Space stat</li>
+<li>Pool Space Usage Over Time (stacked: used + free)</li>
+<li>Pool Capacity Trend time series</li>
+</ul><br />
+<span>Dataset Statistics Row:</span><br />
+<br />
+<ul>
+<li>Table showing all datasets with columns: Pool, Dataset, Used, Available, Referenced</li>
+<li>Automatically filters by selected pool</li>
+</ul><br />
+<span>ARC Cache Statistics Row:</span><br />
+<br />
+<ul>
+<li>ARC Hit Rate gauge (red &lt;70%, yellow &lt;90%, green &gt;=90%)</li>
+<li>ARC Size time series (current, target, max)</li>
+<li>ARC Memory Usage percentage gauge</li>
+<li>ARC Hits vs Misses rate</li>
+<li>ARC Data vs Metadata stacked time series</li>
+</ul><br />
+<span>**Dashboard 2: FreeBSD ZFS Summary (cluster-wide overview)**</span><br />
+<br />
+<span>Cluster-Wide Pool Statistics Row:</span><br />
+<br />
+<ul>
+<li>Total Storage Capacity across all servers</li>
+<li>Total Used space</li>
+<li>Total Free space</li>
+<li>Average Pool Capacity gauge</li>
+<li>Pool Health Status (worst case across cluster)</li>
+<li>Total Pool Space Usage Over Time</li>
+<li>Per-Pool Capacity time series (all pools on all hosts)</li>
+</ul><br />
+<span>Per-Host Pool Breakdown Row:</span><br />
+<br />
+<ul>
+<li>Bar gauge showing capacity by host and pool</li>
+<li>Table with all pools: Host, Pool, Size, Used, Free, Capacity %, Health</li>
+</ul><br />
+<span>Cluster-Wide ARC Statistics Row:</span><br />
+<br />
+<ul>
+<li>Average ARC Hit Rate gauge across all hosts</li>
+<li>ARC Hit Rate by Host time series</li>
+<li>Total ARC Size Across Cluster</li>
+<li>Total ARC Hits vs Misses (cluster-wide sum)</li>
+<li>ARC Size by Host</li>
+</ul><br />
+<span>Dashboard Visualization:</span><br />
+<br />
+<a href='./f3s-kubernetes-with-freebsd-part-8/grafana-zfs-dashboard.png'><img alt='ZFS monitoring dashboard in Grafana showing pool capacity, health, and I/O throughput' title='ZFS monitoring dashboard in Grafana showing pool capacity, health, and I/O throughput' src='./f3s-kubernetes-with-freebsd-part-8/grafana-zfs-dashboard.png' /></a><br />
+<a href='./f3s-kubernetes-with-freebsd-part-8/grafana-zfs-arc-stats.png'><img alt='ZFS ARC cache statistics showing hit rate, memory usage, and size trends' title='ZFS ARC cache statistics showing hit rate, memory usage, and size trends' src='./f3s-kubernetes-with-freebsd-part-8/grafana-zfs-arc-stats.png' /></a><br />
+<a href='./f3s-kubernetes-with-freebsd-part-8/grafana-zfs-datasets.png'><img alt='ZFS datasets table and ARC data vs metadata breakdown' title='ZFS datasets table and ARC data vs metadata breakdown' src='./f3s-kubernetes-with-freebsd-part-8/grafana-zfs-datasets.png' /></a><br />
+<br />
+<h3 style='display: inline' id='deployment'>Deployment</h3><br />
+<br />
+<span>Applied the resources to the cluster:</span><br />
+<br />
+<pre>
+cd /home/paul/git/conf/f3s/prometheus
+kubectl apply -f zfs-recording-rules.yaml
+kubectl apply -f zfs-dashboards.yaml
+</pre>
+<br />
+<span>Updated Justfile to include ZFS recording rules in install and upgrade targets:</span><br />
+<br />
+<pre>
+install:
+ kubectl apply -f persistent-volumes.yaml
+ kubectl create secret generic additional-scrape-configs --from-file=additional-scrape-configs.yaml -n monitoring --dry-run=client -o yaml | kubectl apply -f -
+ helm install prometheus prometheus-community/kube-prometheus-stack --namespace monitoring -f persistence-values.yaml
+ kubectl apply -f freebsd-recording-rules.yaml
+ kubectl apply -f openbsd-recording-rules.yaml
+ kubectl apply -f zfs-recording-rules.yaml
+ just -f grafana-ingress/Justfile install
+</pre>
+<br />
+<h3 style='display: inline' id='verifying-zfs-metrics-in-prometheus'>Verifying ZFS Metrics in Prometheus</h3><br />
+<br />
+<span>Check that ZFS metrics are being collected:</span><br />
+<br />
+<pre>
+kubectl exec -n monitoring prometheus-prometheus-kube-prometheus-prometheus-0 -c prometheus -- \
+ wget -qO- &#39;http://localhost:9090/api/v1/query?query=node_zfs_arcstats_size_bytes&#39;
+</pre>
+<br />
+<span>Check recording rules are calculating correctly:</span><br />
+<br />
+<pre>
+kubectl exec -n monitoring prometheus-prometheus-kube-prometheus-prometheus-0 -c prometheus -- \
+ wget -qO- &#39;http://localhost:9090/api/v1/query?query=node_zfs_arc_memory_usage_percent&#39;
+</pre>
+<br />
+<span>Example output shows memory usage percentage for each FreeBSD server:</span><br />
+<br />
+<pre>
+"result":[
+ {"metric":{"instance":"192.168.2.130:9100","os":"freebsd"},"value":[...,"37.58"]},
+ {"metric":{"instance":"192.168.2.131:9100","os":"freebsd"},"value":[...,"12.85"]},
+ {"metric":{"instance":"192.168.2.132:9100","os":"freebsd"},"value":[...,"13.44"]}
+]
+</pre>
+<br />
+<h3 style='display: inline' id='key-metrics-to-monitor'>Key Metrics to Monitor</h3><br />
+<br />
+<ul>
+<li>ARC Hit Rate: Should typically be above 90% for optimal performance. Lower hit rates indicate the ARC cache is too small or workload has poor locality.</li>
+<li>ARC Memory Usage: Shows how much of the maximum ARC size is being used. If consistently at or near maximum, the ARC is effectively utilizing available memory.</li>
+<li>Data vs Metadata: Typically data should dominate, but workloads with many small files will show higher metadata percentages.</li>
+<li>MRU vs MFU: Most Recently Used vs Most Frequently Used cache. The ratio depends on workload characteristics.</li>
+<li>Pool Capacity: Monitor pool usage to ensure adequate free space. ZFS performance degrades when pools exceed 80% capacity.</li>
+<li>Pool Health: Should always show ONLINE (green). DEGRADED (yellow) indicates a disk issue requiring attention. FAULTED (red) requires immediate action.</li>
+<li>Dataset Usage: Track which datasets are consuming the most space to identify growth trends and plan capacity.</li>
+</ul><br />
+<h3 style='display: inline' id='zfs-pool-and-dataset-metrics-via-textfile-collector'>ZFS Pool and Dataset Metrics via Textfile Collector</h3><br />
+<br />
+<span>To complement the ARC statistics from node_exporter&#39;s built-in ZFS collector, I added pool capacity and dataset metrics using the textfile collector feature.</span><br />
+<br />
+<span>Created a script at <span class='inlinecode'>/usr/local/bin/zfs_pool_metrics.sh</span> on each FreeBSD server:</span><br />
+<br />
+<pre>
+#!/bin/sh
+# ZFS Pool and Dataset Metrics Collector for Prometheus
+
+OUTPUT_FILE="/var/tmp/node_exporter/zfs_pools.prom.$$"
+FINAL_FILE="/var/tmp/node_exporter/zfs_pools.prom"
+
+mkdir -p /var/tmp/node_exporter
+
+{
+ # Pool metrics
+ echo "# HELP zfs_pool_size_bytes Total size of ZFS pool"
+ echo "# TYPE zfs_pool_size_bytes gauge"
+ echo "# HELP zfs_pool_allocated_bytes Allocated space in ZFS pool"
+ echo "# TYPE zfs_pool_allocated_bytes gauge"
+ echo "# HELP zfs_pool_free_bytes Free space in ZFS pool"
+ echo "# TYPE zfs_pool_free_bytes gauge"
+ echo "# HELP zfs_pool_capacity_percent Capacity percentage"
+ echo "# TYPE zfs_pool_capacity_percent gauge"
+ echo "# HELP zfs_pool_health Pool health (0=ONLINE, 1=DEGRADED, 2=FAULTED)"
+ echo "# TYPE zfs_pool_health gauge"
+
+ zpool list -Hp -o name,size,allocated,free,capacity,health | \
+ while IFS=$&#39;\t&#39; read name size alloc free cap health; do
+ case "$health" in
+ ONLINE) health_val=0 ;;
+ DEGRADED) health_val=1 ;;
+ FAULTED) health_val=2 ;;
+ *) health_val=6 ;;
+ esac
+ cap_num=$(echo "$cap" | sed &#39;s/%//&#39;)
+
+ echo "zfs_pool_size_bytes{pool=\"$name\"} $size"
+ echo "zfs_pool_allocated_bytes{pool=\"$name\"} $alloc"
+ echo "zfs_pool_free_bytes{pool=\"$name\"} $free"
+ echo "zfs_pool_capacity_percent{pool=\"$name\"} $cap_num"
+ echo "zfs_pool_health{pool=\"$name\"} $health_val"
+ done
+
+ # Dataset metrics
+ echo "# HELP zfs_dataset_used_bytes Used space in dataset"
+ echo "# TYPE zfs_dataset_used_bytes gauge"
+ echo "# HELP zfs_dataset_available_bytes Available space"
+ echo "# TYPE zfs_dataset_available_bytes gauge"
+ echo "# HELP zfs_dataset_referenced_bytes Referenced space"
+ echo "# TYPE zfs_dataset_referenced_bytes gauge"
+
+ zfs list -Hp -t filesystem -o name,used,available,referenced | \
+ while IFS=$&#39;\t&#39; read name used avail ref; do
+ pool=$(echo "$name" | cut -d/ -f1)
+ echo "zfs_dataset_used_bytes{pool=\"$pool\",dataset=\"$name\"} $used"
+ echo "zfs_dataset_available_bytes{pool=\"$pool\",dataset=\"$name\"} $avail"
+ echo "zfs_dataset_referenced_bytes{pool=\"$pool\",dataset=\"$name\"} $ref"
+ done
+} &gt; "$OUTPUT_FILE"
+
+mv "$OUTPUT_FILE" "$FINAL_FILE"
+</pre>
+<br />
+<span>Deployed to all FreeBSD servers:</span><br />
+<br />
+<pre>
+for host in f0 f1 f2; do
+ scp /tmp/zfs_pool_metrics.sh paul@$host:/tmp/
+ ssh paul@$host &#39;doas mv /tmp/zfs_pool_metrics.sh /usr/local/bin/ &amp;&amp; \
+ doas chmod +x /usr/local/bin/zfs_pool_metrics.sh&#39;
+done
+</pre>
+<br />
+<span>Set up cron jobs to run every minute:</span><br />
+<br />
+<pre>
+for host in f0 f1 f2; do
+ ssh paul@$host &#39;echo "* * * * * /usr/local/bin/zfs_pool_metrics.sh &gt;/dev/null 2&gt;&amp;1" | \
+ doas crontab -&#39;
+done
+</pre>
+<br />
+<span>The textfile collector (already configured with --collector.textfile.directory=/var/tmp/node_exporter) automatically picks up the metrics.</span><br />
+<br />
+<span>Verify metrics are being exposed:</span><br />
+<br />
+<pre>
+paul@f0:~ % curl -s http://localhost:9100/metrics | grep "^zfs_pool" | head -5
+zfs_pool_allocated_bytes{pool="zdata"} 6.47622733824e+11
+zfs_pool_allocated_bytes{pool="zroot"} 5.3338578944e+10
+zfs_pool_capacity_percent{pool="zdata"} 64
+zfs_pool_capacity_percent{pool="zroot"} 10
+zfs_pool_free_bytes{pool="zdata"} 3.48809678848e+11
+</pre>
+<br />
+<span>All ZFS-related configuration files are available on Codeberg:</span><br />
+<br />
+<a class='textlink' href='https://codeberg.org/snonux/conf/src/branch/master/f3s/prometheus/zfs-recording-rules.yaml'>zfs-recording-rules.yaml on Codeberg</a><br />
+<a class='textlink' href='https://codeberg.org/snonux/conf/src/branch/master/f3s/prometheus/zfs-dashboards.yaml'>zfs-dashboards.yaml on Codeberg</a><br />
<br />
<h2 style='display: inline' id='monitoring-external-openbsd-hosts'>Monitoring external OpenBSD hosts</h2><br />
<br />
@@ -4169,10 +4533,10 @@ spec:
by Lorenzo Bettini
http://www.lorenzobettini.it
http://www.gnu.org/software/src-highlite -->
-<pre>blowfish:~ $ doas pkg_add node_exporter
-quirks-<font color="#000000">7.103</font> signed on <font color="#000000">2025</font>-<font color="#000000">10</font>-13T22:<font color="#000000">55</font>:16Z
-The following new rcscripts were installed: /etc/rc.d/node_exporter
-See rcctl(<font color="#000000">8</font>) <b><u><font color="#000000">for</font></u></b> details.
+<pre><font color="#ff0000">blowfish</font><font color="#F3E651">:~</font><font color="#ff0000"> $ doas pkg_add node_exporter</font>
+<font color="#ff0000">quirks-</font><font color="#bb00ff">7.103</font><font color="#ff0000"> signed on </font><font color="#bb00ff">2025</font><font color="#ff0000">-</font><font color="#bb00ff">10</font><font color="#ff0000">-13T22</font><font color="#F3E651">:</font><font color="#bb00ff">55</font><font color="#F3E651">:</font><font color="#ff0000">16Z</font>
+<font color="#ff0000">The following new rcscripts were installed</font><font color="#F3E651">:</font><font color="#ff0000"> /etc/rc</font><font color="#F3E651">.</font><font color="#ff0000">d/node_exporter</font>
+<font color="#ff0000">See rcctl</font><font color="#F3E651">(</font><font color="#bb00ff">8</font><font color="#F3E651">)</font><font color="#ff0000"> </font><b><font color="#ffffff">for</font></b><font color="#ff0000"> details</font><font color="#F3E651">.</font>
</pre>
<br />
<span>Enable the service to start at boot:</span><br />
@@ -4181,7 +4545,7 @@ See rcctl(<font color="#000000">8</font>) <b><u><font color="#000000">for</font>
by Lorenzo Bettini
http://www.lorenzobettini.it
http://www.gnu.org/software/src-highlite -->
-<pre>blowfish:~ $ doas rcctl <b><u><font color="#000000">enable</font></u></b> node_exporter
+<pre><font color="#ff0000">blowfish</font><font color="#F3E651">:~</font><font color="#ff0000"> $ doas rcctl </font><b><font color="#ffffff">enable</font></b><font color="#ff0000"> node_exporter</font>
</pre>
<br />
<span>Configure node_exporter to listen on the WireGuard interface. This ensures metrics are only accessible through the secure tunnel, not the public network. Replace the IP with the host&#39;s WireGuard address:</span><br />
@@ -4190,7 +4554,7 @@ http://www.gnu.org/software/src-highlite -->
by Lorenzo Bettini
http://www.lorenzobettini.it
http://www.gnu.org/software/src-highlite -->
-<pre>blowfish:~ $ doas rcctl <b><u><font color="#000000">set</font></u></b> node_exporter flags <font color="#808080">'--web.listen-address=192.168.2.110:9100'</font>
+<pre><font color="#ff0000">blowfish</font><font color="#F3E651">:~</font><font color="#ff0000"> $ doas rcctl </font><b><font color="#ffffff">set</font></b><font color="#ff0000"> node_exporter flags </font><font color="#bb00ff">'--web.listen-address=192.168.2.110:9100'</font>
</pre>
<br />
<span>Start the service:</span><br />
@@ -4199,8 +4563,8 @@ http://www.gnu.org/software/src-highlite -->
by Lorenzo Bettini
http://www.lorenzobettini.it
http://www.gnu.org/software/src-highlite -->
-<pre>blowfish:~ $ doas rcctl start node_exporter
-node_exporter(ok)
+<pre><font color="#ff0000">blowfish</font><font color="#F3E651">:~</font><font color="#ff0000"> $ doas rcctl start node_exporter</font>
+<font color="#ff0000">node_exporter</font><font color="#F3E651">(</font><font color="#ff0000">ok</font><font color="#F3E651">)</font>
</pre>
<br />
<span>Verify it&#39;s running:</span><br />
@@ -4209,10 +4573,10 @@ node_exporter(ok)
by Lorenzo Bettini
http://www.lorenzobettini.it
http://www.gnu.org/software/src-highlite -->
-<pre>blowfish:~ $ curl -s http://<font color="#000000">192.168</font>.<font color="#000000">2.110</font>:<font color="#000000">9100</font>/metrics | head -<font color="#000000">3</font>
-<i><font color="silver"># HELP go_gc_duration_seconds A summary of the wall-time pause...</font></i>
-<i><font color="silver"># TYPE go_gc_duration_seconds summary</font></i>
-go_gc_duration_seconds{quantile=<font color="#808080">"0"</font>} <font color="#000000">0</font>
+<pre><font color="#ff0000">blowfish</font><font color="#F3E651">:~</font><font color="#ff0000"> $ curl -s http</font><font color="#F3E651">://</font><font color="#bb00ff">192.168</font><font color="#F3E651">.</font><font color="#bb00ff">2.110</font><font color="#F3E651">:</font><font color="#bb00ff">9100</font><font color="#ff0000">/metrics </font><font color="#F3E651">|</font><font color="#ff0000"> head -</font><font color="#bb00ff">3</font>
+<i><font color="#ababab"># HELP go_gc_duration_seconds A summary of the wall-time pause...</font></i>
+<i><font color="#ababab"># TYPE go_gc_duration_seconds summary</font></i>
+<font color="#ff0000">go_gc_duration_seconds{</font><font color="#ff0000">quantile</font><font color="#F3E651">=</font><font color="#bb00ff">"0"</font><font color="#ff0000">} </font><font color="#bb00ff">0</font>
</pre>
<br />
<span>Repeat for the other OpenBSD host (<span class='inlinecode'>fishfinger</span>) with its respective WireGuard IP (<span class='inlinecode'>192.168.2.111</span>).</span><br />
@@ -4282,18 +4646,671 @@ spec:
<br />
<span>After running <span class='inlinecode'>just upgrade</span>, the OpenBSD hosts appear in Prometheus targets and the Node Exporter dashboards.</span><br />
<br />
+<h2 style='display: inline' id='distributed-tracing-with-grafana-tempo'>Distributed Tracing with Grafana Tempo</h2><br />
+<br />
+<span>After implementing logs (Loki) and metrics (Prometheus), the final pillar of observability is distributed tracing. Grafana Tempo provides distributed tracing capabilities that help understand request flows across microservices.</span><br />
+<br />
+<span>For a preview of what distributed tracing with Tempo looks like in Grafana, see the X-RAG blog post:</span><br />
+<br />
+<a class='textlink' href='./2025-12-24-x-rag-observability-hackathon.html'>X-RAG Observability Hackathon</a><br />
+<br />
+<h3 style='display: inline' id='why-distributed-tracing'>Why Distributed Tracing?</h3><br />
+<br />
+<span>In a microservices architecture, a single user request may traverse multiple services. Distributed tracing:</span><br />
+<br />
+<ul>
+<li>Tracks requests across service boundaries</li>
+<li>Identifies performance bottlenecks</li>
+<li>Visualizes service dependencies</li>
+<li>Correlates with logs and metrics</li>
+<li>Helps debug complex distributed systems</li>
+</ul><br />
+<h3 style='display: inline' id='deploying-grafana-tempo'>Deploying Grafana Tempo</h3><br />
+<br />
+<span>Tempo is deployed in monolithic mode, following the same pattern as Loki&#39;s SingleBinary deployment.</span><br />
+<br />
+<span>#### Configuration Strategy</span><br />
+<br />
+<span>**Deployment Mode:** Monolithic (all components in one process)</span><br />
+<ul>
+<li>Simpler operation than microservices mode</li>
+<li>Suitable for the cluster scale</li>
+<li>Consistent with Loki deployment pattern</li>
+</ul><br />
+<span>**Storage:** Filesystem backend using hostPath</span><br />
+<ul>
+<li>10Gi storage at /data/nfs/k3svolumes/tempo/data</li>
+<li>7-day retention (168h)</li>
+<li>Local storage is the only option for monolithic mode</li>
+</ul><br />
+<span>**OTLP Receivers:** Standard OpenTelemetry Protocol ports</span><br />
+<ul>
+<li>gRPC: 4317</li>
+<li>HTTP: 4318</li>
+<li>Bind to 0.0.0.0 to avoid Tempo 2.7+ localhost-only binding issue</li>
+</ul><br />
+<span>#### Tempo Deployment Files</span><br />
+<br />
+<span>Created in /home/paul/git/conf/f3s/tempo/:</span><br />
+<br />
+<span>**values.yaml** - Helm chart configuration:</span><br />
+<br />
+<pre>
+tempo:
+ retention: 168h
+ storage:
+ trace:
+ backend: local
+ local:
+ path: /var/tempo/traces
+ wal:
+ path: /var/tempo/wal
+ receivers:
+ otlp:
+ protocols:
+ grpc:
+ endpoint: 0.0.0.0:4317
+ http:
+ endpoint: 0.0.0.0:4318
+
+persistence:
+ enabled: true
+ size: 10Gi
+ storageClassName: ""
+
+resources:
+ limits:
+ cpu: 1000m
+ memory: 2Gi
+ requests:
+ cpu: 500m
+ memory: 1Gi
+</pre>
+<br />
+<span>**persistent-volumes.yaml** - Storage configuration:</span><br />
+<br />
+<pre>
+apiVersion: v1
+kind: PersistentVolume
+metadata:
+ name: tempo-data-pv
+spec:
+ capacity:
+ storage: 10Gi
+ accessModes:
+ - ReadWriteOnce
+ persistentVolumeReclaimPolicy: Retain
+ hostPath:
+ path: /data/nfs/k3svolumes/tempo/data
+---
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+ name: tempo-data-pvc
+ namespace: monitoring
+spec:
+ storageClassName: ""
+ accessModes:
+ - ReadWriteOnce
+ resources:
+ requests:
+ storage: 10Gi
+</pre>
+<br />
+<span>**Grafana Datasource Provisioning**</span><br />
+<br />
+<span>All Grafana datasources (Prometheus, Alertmanager, Loki, Tempo) are provisioned via a unified ConfigMap that is directly mounted to the Grafana pod. This approach ensures datasources are loaded on startup without requiring sidecar-based discovery.</span><br />
+<br />
+<span>In /home/paul/git/conf/f3s/prometheus/grafana-datasources-all.yaml:</span><br />
+<br />
+<pre>
+apiVersion: v1
+kind: ConfigMap
+metadata:
+ name: grafana-datasources-all
+ namespace: monitoring
+data:
+ datasources.yaml: |
+ apiVersion: 1
+ datasources:
+ - name: Prometheus
+ type: prometheus
+ uid: prometheus
+ url: http://prometheus-kube-prometheus-prometheus.monitoring:9090/
+ access: proxy
+ isDefault: true
+ - name: Alertmanager
+ type: alertmanager
+ uid: alertmanager
+ url: http://prometheus-kube-prometheus-alertmanager.monitoring:9093/
+ - name: Loki
+ type: loki
+ uid: loki
+ url: http://loki.monitoring.svc.cluster.local:3100
+ - name: Tempo
+ type: tempo
+ uid: tempo
+ url: http://tempo.monitoring.svc.cluster.local:3200
+ jsonData:
+ tracesToLogsV2:
+ datasourceUid: loki
+ spanStartTimeShift: -1h
+ spanEndTimeShift: 1h
+ tracesToMetrics:
+ datasourceUid: prometheus
+ serviceMap:
+ datasourceUid: prometheus
+ nodeGraph:
+ enabled: true
+</pre>
+<br />
+<span>The kube-prometheus-stack Helm values (persistence-values.yaml) are configured to:</span><br />
+<ul>
+<li>Disable sidecar-based datasource provisioning</li>
+<li>Mount grafana-datasources-all ConfigMap directly to /etc/grafana/provisioning/datasources/</li>
+</ul><br />
+<span>This direct mounting approach is simpler and more reliable than sidecar-based discovery.</span><br />
+<br />
+<span>#### Installation</span><br />
+<br />
+<pre>
+cd /home/paul/git/conf/f3s/tempo
+just install
+</pre>
+<br />
+<span>Verify Tempo is running:</span><br />
+<br />
+<pre>
+kubectl get pods -n monitoring -l app.kubernetes.io/name=tempo
+kubectl exec -n monitoring &lt;tempo-pod&gt; -- wget -qO- http://localhost:3200/ready
+</pre>
+<br />
+<h3 style='display: inline' id='configuring-grafana-alloy-for-trace-collection'>Configuring Grafana Alloy for Trace Collection</h3><br />
+<br />
+<span>Updated /home/paul/git/conf/f3s/loki/alloy-values.yaml to add OTLP receivers for traces while maintaining existing log collection.</span><br />
+<br />
+<span>#### OTLP Receiver Configuration</span><br />
+<br />
+<span>Added to Alloy configuration after the log collection pipeline:</span><br />
+<br />
+<pre>
+// OTLP receiver for traces via gRPC and HTTP
+otelcol.receiver.otlp "default" {
+ grpc {
+ endpoint = "0.0.0.0:4317"
+ }
+ http {
+ endpoint = "0.0.0.0:4318"
+ }
+ output {
+ traces = [otelcol.processor.batch.default.input]
+ }
+}
+
+// Batch processor for efficient trace forwarding
+otelcol.processor.batch "default" {
+ timeout = "5s"
+ send_batch_size = 100
+ send_batch_max_size = 200
+ output {
+ traces = [otelcol.exporter.otlp.tempo.input]
+ }
+}
+
+// OTLP exporter to send traces to Tempo
+otelcol.exporter.otlp "tempo" {
+ client {
+ endpoint = "tempo.monitoring.svc.cluster.local:4317"
+ tls {
+ insecure = true
+ }
+ compression = "gzip"
+ }
+}
+</pre>
+<br />
+<span>The batch processor reduces network overhead by accumulating spans before forwarding to Tempo.</span><br />
+<br />
+<span>#### Upgrade Alloy</span><br />
+<br />
+<pre>
+cd /home/paul/git/conf/f3s/loki
+just upgrade
+</pre>
+<br />
+<span>Verify OTLP receivers are listening:</span><br />
+<br />
+<pre>
+kubectl logs -n monitoring -l app.kubernetes.io/name=alloy | grep -i "otlp.*receiver"
+kubectl exec -n monitoring &lt;alloy-pod&gt; -- netstat -ln | grep -E &#39;:(4317|4318)&#39;
+</pre>
+<br />
+<h3 style='display: inline' id='demo-tracing-application'>Demo Tracing Application</h3><br />
+<br />
+<span>Created a three-tier Python application to demonstrate distributed tracing in action.</span><br />
+<br />
+<span>#### Application Architecture</span><br />
+<br />
+<pre>
+User → Frontend (Flask:5000) → Middleware (Flask:5001) → Backend (Flask:5002)
+ ↓ ↓ ↓
+ Alloy (OTLP:4317) → Tempo → Grafana
+</pre>
+<br />
+<span>Frontend Service:</span><br />
+<br />
+<ul>
+<li>Receives HTTP requests at /api/process</li>
+<li>Forwards to middleware service</li>
+<li>Creates parent span for the entire request</li>
+</ul><br />
+<span>Middleware Service:</span><br />
+<br />
+<ul>
+<li>Transforms data at /api/transform</li>
+<li>Calls backend service</li>
+<li>Creates child span linked to frontend</li>
+</ul><br />
+<span>Backend Service:</span><br />
+<br />
+<ul>
+<li>Returns data at /api/data</li>
+<li>Simulates database query (100ms sleep)</li>
+<li>Creates leaf span in the trace</li>
+</ul><br />
+<span>OpenTelemetry Instrumentation:</span><br />
+<br />
+<span>All services use Python OpenTelemetry libraries:</span><br />
+<br />
+<span>**Dependencies:**</span><br />
+<pre>
+flask==3.0.0
+requests==2.31.0
+opentelemetry-distro==0.49b0
+opentelemetry-exporter-otlp==1.28.0
+opentelemetry-instrumentation-flask==0.49b0
+opentelemetry-instrumentation-requests==0.49b0
+</pre>
+<br />
+<span>**Auto-instrumentation pattern** (used in all services):</span><br />
+<br />
+<!-- Generator: GNU source-highlight 3.1.9
+by Lorenzo Bettini
+http://www.lorenzobettini.it
+http://www.gnu.org/software/src-highlite -->
+<pre><font color="#ababab">from</font><font color="#ff0000"> opentelemetry </font><font color="#ababab">import</font><font color="#ff0000"> trace</font>
+<font color="#ababab">from</font><font color="#ff0000"> opentelemetry</font><font color="#F3E651">.</font><font color="#ff0000">sdk</font><font color="#F3E651">.</font><font color="#ff0000">trace </font><font color="#ababab">import</font><font color="#ff0000"> TracerProvider</font>
+<font color="#ababab">from</font><font color="#ff0000"> opentelemetry</font><font color="#F3E651">.</font><font color="#ff0000">exporter</font><font color="#F3E651">.</font><font color="#ff0000">otlp</font><font color="#F3E651">.</font><font color="#ff0000">proto</font><font color="#F3E651">.</font><font color="#ff0000">grpc</font><font color="#F3E651">.</font><font color="#ff0000">trace_exporter </font><font color="#ababab">import</font><font color="#ff0000"> OTLPSpanExporter</font>
+<font color="#ababab">from</font><font color="#ff0000"> opentelemetry</font><font color="#F3E651">.</font><font color="#ff0000">instrumentation</font><font color="#F3E651">.</font><font color="#ff0000">flask </font><font color="#ababab">import</font><font color="#ff0000"> FlaskInstrumentor</font>
+<font color="#ababab">from</font><font color="#ff0000"> opentelemetry</font><font color="#F3E651">.</font><font color="#ff0000">instrumentation</font><font color="#F3E651">.</font><font color="#ff0000">requests </font><font color="#ababab">import</font><font color="#ff0000"> RequestsInstrumentor</font>
+<font color="#ababab">from</font><font color="#ff0000"> opentelemetry</font><font color="#F3E651">.</font><font color="#ff0000">sdk</font><font color="#F3E651">.</font><font color="#ff0000">resources </font><font color="#ababab">import</font><font color="#ff0000"> Resource</font>
+
+<i><font color="#ababab"># Define service identity</font></i>
+<font color="#ff0000">resource </font><font color="#F3E651">=</font><font color="#ff0000"> </font><font color="#7bc710">Resource</font><font color="#F3E651">(</font><font color="#ff0000">attributes</font><font color="#F3E651">={</font>
+<font color="#ff0000"> </font><font color="#bb00ff">"service.name"</font><font color="#F3E651">:</font><font color="#ff0000"> </font><font color="#bb00ff">"frontend"</font><font color="#F3E651">,</font>
+<font color="#ff0000"> </font><font color="#bb00ff">"service.namespace"</font><font color="#F3E651">:</font><font color="#ff0000"> </font><font color="#bb00ff">"tracing-demo"</font><font color="#F3E651">,</font>
+<font color="#ff0000"> </font><font color="#bb00ff">"service.version"</font><font color="#F3E651">:</font><font color="#ff0000"> </font><font color="#bb00ff">"1.0.0"</font>
+<font color="#F3E651">})</font>
+
+<font color="#ff0000">provider </font><font color="#F3E651">=</font><font color="#ff0000"> </font><font color="#7bc710">TracerProvider</font><font color="#F3E651">(</font><font color="#ff0000">resource</font><font color="#F3E651">=</font><font color="#ff0000">resource</font><font color="#F3E651">)</font>
+
+<i><font color="#ababab"># Export to Alloy</font></i>
+<font color="#ff0000">otlp_exporter </font><font color="#F3E651">=</font><font color="#ff0000"> </font><font color="#7bc710">OTLPSpanExporter</font><font color="#F3E651">(</font>
+<font color="#ff0000"> endpoint</font><font color="#F3E651">=</font><font color="#bb00ff">"http://alloy.monitoring.svc.cluster.local:4317"</font><font color="#F3E651">,</font>
+<font color="#ff0000"> insecure</font><font color="#F3E651">=</font><font color="#ff0000">True</font>
+<font color="#F3E651">)</font>
+
+<font color="#ff0000">processor </font><font color="#F3E651">=</font><font color="#ff0000"> </font><font color="#7bc710">BatchSpanProcessor</font><font color="#F3E651">(</font><font color="#ff0000">otlp_exporter</font><font color="#F3E651">)</font>
+<font color="#ff0000">provider</font><font color="#F3E651">.</font><font color="#7bc710">add_span_processor</font><font color="#F3E651">(</font><font color="#ff0000">processor</font><font color="#F3E651">)</font>
+<font color="#ff0000">trace</font><font color="#F3E651">.</font><font color="#7bc710">set_tracer_provider</font><font color="#F3E651">(</font><font color="#ff0000">provider</font><font color="#F3E651">)</font>
+
+<i><font color="#ababab"># Auto-instrument Flask and requests</font></i>
+<font color="#7bc710">FlaskInstrumentor</font><font color="#F3E651">().</font><font color="#7bc710">instrument_app</font><font color="#F3E651">(</font><font color="#ff0000">app</font><font color="#F3E651">)</font>
+<font color="#7bc710">RequestsInstrumentor</font><font color="#F3E651">().</font><font color="#7bc710">instrument</font><font color="#F3E651">()</font>
+</pre>
+<br />
+<span>The auto-instrumentation automatically:</span><br />
+<ul>
+<li>Creates spans for HTTP requests</li>
+<li>Propagates trace context via W3C Trace Context headers</li>
+<li>Links parent and child spans across service boundaries</li>
+</ul><br />
+<span>Deployment:</span><br />
+<br />
+<span>Created Helm chart in /home/paul/git/conf/f3s/tracing-demo/ with three separate deployments, services, and an ingress.</span><br />
+<br />
+<span>Build and deploy:</span><br />
+<br />
+<pre>
+cd /home/paul/git/conf/f3s/tracing-demo
+just build
+just import
+just install
+</pre>
+<br />
+<span>Verify deployment:</span><br />
+<br />
+<pre>
+kubectl get pods -n services | grep tracing-demo
+kubectl get ingress -n services tracing-demo-ingress
+</pre>
+<br />
+<span>Access the application at:</span><br />
+<br />
+<a class='textlink' href='http://tracing-demo.f3s.buetow.org'>http://tracing-demo.f3s.buetow.org</a><br />
+<br />
+<h3 style='display: inline' id='visualizing-traces-in-grafana'>Visualizing Traces in Grafana</h3><br />
+<br />
+<span>The Tempo datasource is automatically discovered by Grafana through the ConfigMap label.</span><br />
+<br />
+<span>#### Accessing Traces</span><br />
+<br />
+<span>Navigate to Grafana → Explore → Select "Tempo" datasource</span><br />
+<br />
+<span>**Search Interface:**</span><br />
+<ul>
+<li>Search by Trace ID</li>
+<li>Search by service name</li>
+<li>Search by tags</li>
+</ul><br />
+<span>**TraceQL Queries:**</span><br />
+<br />
+<span>Find all traces from demo app:</span><br />
+<pre>
+{ resource.service.namespace = "tracing-demo" }
+</pre>
+<br />
+<span>Find slow requests (&gt;200ms):</span><br />
+<pre>
+{ duration &gt; 200ms }
+</pre>
+<br />
+<span>Find traces from specific service:</span><br />
+<pre>
+{ resource.service.name = "frontend" }
+</pre>
+<br />
+<span>Find errors:</span><br />
+<pre>
+{ status = error }
+</pre>
+<br />
+<span>Complex query - frontend traces calling middleware:</span><br />
+<pre>
+{ resource.service.namespace = "tracing-demo" } &amp;&amp; { span.http.status_code &gt;= 500 }
+</pre>
+<br />
+<span>#### Service Graph Visualization</span><br />
+<br />
+<span>The service graph shows visual connections between services:</span><br />
+<br />
+<span>1. Navigate to Explore → Tempo</span><br />
+<span>2. Enable "Service Graph" view</span><br />
+<span>3. Shows: Frontend → Middleware → Backend with request rates</span><br />
+<br />
+<span>The service graph uses Prometheus metrics generated from trace data.</span><br />
+<br />
+<h3 style='display: inline' id='correlation-between-observability-signals'>Correlation Between Observability Signals</h3><br />
+<br />
+<span>Tempo integrates with Loki and Prometheus to provide unified observability.</span><br />
+<br />
+<span>#### Traces-to-Logs</span><br />
+<br />
+<span>Click on any span in a trace to see related logs:</span><br />
+<br />
+<span>1. View trace in Grafana</span><br />
+<span>2. Click on a span</span><br />
+<span>3. Select "Logs for this span"</span><br />
+<span>4. Loki shows logs filtered by:</span><br />
+<span> * Time range (span duration ± 1 hour)</span><br />
+<span> * Service name</span><br />
+<span> * Namespace</span><br />
+<span> * Pod</span><br />
+<br />
+<span>This helps correlate what the service was doing when the span was created.</span><br />
+<br />
+<span>#### Traces-to-Metrics</span><br />
+<br />
+<span>View Prometheus metrics for services in the trace:</span><br />
+<br />
+<span>1. View trace in Grafana</span><br />
+<span>2. Select "Metrics" tab</span><br />
+<span>3. Shows metrics like:</span><br />
+<span> * Request rate</span><br />
+<span> * Error rate</span><br />
+<span> * Duration percentiles</span><br />
+<br />
+<span>#### Logs-to-Traces</span><br />
+<br />
+<span>From logs, you can jump to related traces:</span><br />
+<br />
+<span>1. In Loki, logs that contain trace IDs are automatically linked</span><br />
+<span>2. Click the trace ID to view the full trace</span><br />
+<span>3. See the complete request flow</span><br />
+<br />
+<h3 style='display: inline' id='generating-traces-for-testing'>Generating Traces for Testing</h3><br />
+<br />
+<span>Test the demo application:</span><br />
+<br />
+<pre>
+curl http://tracing-demo.f3s.buetow.org/api/process
+</pre>
+<br />
+<span>Load test (generates 50 traces):</span><br />
+<br />
+<pre>
+cd /home/paul/git/conf/f3s/tracing-demo
+just load-test
+</pre>
+<br />
+<span>Each request creates a distributed trace spanning all three services.</span><br />
+<br />
+<h3 style='display: inline' id='verifying-the-complete-pipeline'>Verifying the Complete Pipeline</h3><br />
+<br />
+<span>Check the trace flow end-to-end:</span><br />
+<br />
+<span>**1. Application generates traces:**</span><br />
+<pre>
+kubectl logs -n services -l app=tracing-demo-frontend | grep -i trace
+</pre>
+<br />
+<span>**2. Alloy receives traces:**</span><br />
+<pre>
+kubectl logs -n monitoring -l app.kubernetes.io/name=alloy | grep -i otlp
+</pre>
+<br />
+<span>**3. Tempo stores traces:**</span><br />
+<pre>
+kubectl logs -n monitoring -l app.kubernetes.io/name=tempo | grep -i trace
+</pre>
+<br />
+<span>**4. Grafana displays traces:**</span><br />
+<span>Navigate to Explore → Tempo → Search for traces</span><br />
+<br />
+<h3 style='display: inline' id='practical-example-viewing-a-distributed-trace'>Practical Example: Viewing a Distributed Trace</h3><br />
+<br />
+<span>Let&#39;s generate a trace and examine it in Grafana.</span><br />
+<br />
+<span>**1. Generate a trace by calling the demo application:**</span><br />
+<br />
+<pre>
+curl -H "Host: tracing-demo.f3s.buetow.org" http://r0/api/process
+</pre>
+<br />
+<span>**Response (HTTP 200):**</span><br />
+<br />
+<!-- Generator: GNU source-highlight 3.1.9
+by Lorenzo Bettini
+http://www.lorenzobettini.it
+http://www.gnu.org/software/src-highlite -->
+<pre><font color="#F3E651">{</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">middleware_response</font><font color="#ff0000">"</font><font color="#ff0000">: </font><font color="#F3E651">{</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">backend_data</font><font color="#ff0000">"</font><font color="#ff0000">: </font><font color="#F3E651">{</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">data</font><font color="#ff0000">"</font><font color="#ff0000">: </font><font color="#F3E651">{</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">id</font><font color="#ff0000">"</font><font color="#ff0000">: </font><font color="#bb00ff">12345</font><font color="#F3E651">,</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">query_time_ms</font><font color="#ff0000">"</font><font color="#ff0000">: </font><font color="#bb00ff">100.0</font><font color="#F3E651">,</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">timestamp</font><font color="#ff0000">"</font><font color="#ff0000">:</font><font color="#ff0000"> "</font><font color="#bb00ff">2025-12-28T18:35:01.064538</font><font color="#ff0000">"</font><font color="#F3E651">,</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">value</font><font color="#ff0000">"</font><font color="#ff0000">:</font><font color="#ff0000"> "</font><font color="#bb00ff">Sample data from backend service</font><font color="#ff0000">"</font>
+<font color="#ff0000"> </font><font color="#F3E651">},</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">service</font><font color="#ff0000">"</font><font color="#ff0000">:</font><font color="#ff0000"> "</font><font color="#bb00ff">backend</font><font color="#ff0000">"</font>
+<font color="#ff0000"> </font><font color="#F3E651">},</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">middleware_processed</font><font color="#ff0000">"</font><font color="#ff0000">: </font><b><font color="#ffffff">true</font></b><font color="#F3E651">,</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">original_data</font><font color="#ff0000">"</font><font color="#ff0000">: </font><font color="#F3E651">{</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">source</font><font color="#ff0000">"</font><font color="#ff0000">:</font><font color="#ff0000"> "</font><font color="#bb00ff">GET request</font><font color="#ff0000">"</font>
+<font color="#ff0000"> </font><font color="#F3E651">},</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">transformation_time_ms</font><font color="#ff0000">"</font><font color="#ff0000">: </font><font color="#bb00ff">50</font>
+<font color="#ff0000"> </font><font color="#F3E651">},</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">request_data</font><font color="#ff0000">"</font><font color="#ff0000">: </font><font color="#F3E651">{</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">source</font><font color="#ff0000">"</font><font color="#ff0000">:</font><font color="#ff0000"> "</font><font color="#bb00ff">GET request</font><font color="#ff0000">"</font>
+<font color="#ff0000"> </font><font color="#F3E651">},</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">service</font><font color="#ff0000">"</font><font color="#ff0000">:</font><font color="#ff0000"> "</font><font color="#bb00ff">frontend</font><font color="#ff0000">"</font><font color="#F3E651">,</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">status</font><font color="#ff0000">"</font><font color="#ff0000">:</font><font color="#ff0000"> "</font><font color="#bb00ff">success</font><font color="#ff0000">"</font>
+<font color="#F3E651">}</font>
+</pre>
+<br />
+<span>**2. Find the trace in Tempo via API:**</span><br />
+<br />
+<span>After a few seconds (for batch export), search for recent traces:</span><br />
+<br />
+<pre>
+kubectl exec -n monitoring tempo-0 -- wget -qO- \
+ &#39;http://localhost:3200/api/search?tags=service.namespace%3Dtracing-demo&amp;limit=5&#39; 2&gt;/dev/null | \
+ python3 -m json.tool
+</pre>
+<br />
+<span>Returns traces including:</span><br />
+<br />
+<!-- Generator: GNU source-highlight 3.1.9
+by Lorenzo Bettini
+http://www.lorenzobettini.it
+http://www.gnu.org/software/src-highlite -->
+<pre><font color="#F3E651">{</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">traceID</font><font color="#ff0000">"</font><font color="#ff0000">:</font><font color="#ff0000"> "</font><font color="#bb00ff">4be1151c0bdcd5625ac7e02b98d95bd5</font><font color="#ff0000">"</font><font color="#F3E651">,</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">rootServiceName</font><font color="#ff0000">"</font><font color="#ff0000">:</font><font color="#ff0000"> "</font><font color="#bb00ff">frontend</font><font color="#ff0000">"</font><font color="#F3E651">,</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">rootTraceName</font><font color="#ff0000">"</font><font color="#ff0000">:</font><font color="#ff0000"> "</font><font color="#bb00ff">GET /api/process</font><font color="#ff0000">"</font><font color="#F3E651">,</font>
+<font color="#ff0000"> </font><font color="#ff0000">"</font><font color="#ff0000">durationMs</font><font color="#ff0000">"</font><font color="#ff0000">: </font><font color="#bb00ff">221</font>
+<font color="#F3E651">}</font>
+</pre>
+<br />
+<span>**3. Fetch complete trace details:**</span><br />
+<br />
+<pre>
+kubectl exec -n monitoring tempo-0 -- wget -qO- \
+ &#39;http://localhost:3200/api/traces/4be1151c0bdcd5625ac7e02b98d95bd5&#39; 2&gt;/dev/null | \
+ python3 -m json.tool
+</pre>
+<br />
+<span>**Trace structure (8 spans across 3 services):**</span><br />
+<br />
+<pre>
+Trace ID: 4be1151c0bdcd5625ac7e02b98d95bd5
+Services: 3 (frontend, middleware, backend)
+
+Service: frontend
+ └─ GET /api/process 221.10ms (HTTP server span)
+ └─ frontend-process 216.23ms (custom business logic span)
+ └─ POST 209.97ms (HTTP client span to middleware)
+
+Service: middleware
+ └─ POST /api/transform 186.02ms (HTTP server span)
+ └─ middleware-transform 180.96ms (custom business logic span)
+ └─ GET 127.52ms (HTTP client span to backend)
+
+Service: backend
+ └─ GET /api/data 103.93ms (HTTP server span)
+ └─ backend-get-data 102.11ms (custom business logic span with 100ms sleep)
+</pre>
+<br />
+<span>**4. View the trace in Grafana UI:**</span><br />
+<br />
+<span>Navigate to: Grafana → Explore → Tempo datasource</span><br />
+<br />
+<span>Search using TraceQL:</span><br />
+<pre>
+{ resource.service.namespace = "tracing-demo" }
+</pre>
+<br />
+<span>Or directly open the trace by pasting the trace ID in the search box:</span><br />
+<pre>
+4be1151c0bdcd5625ac7e02b98d95bd5
+</pre>
+<br />
+<span>**5. Trace visualization:**</span><br />
+<br />
+<span>The trace waterfall view in Grafana shows the complete request flow with timing:</span><br />
+<br />
+<a href='./f3s-kubernetes-with-freebsd-part-8/grafana-tempo-trace.png'><img alt='Distributed trace visualization in Grafana Tempo showing Frontend → Middleware → Backend spans' title='Distributed trace visualization in Grafana Tempo showing Frontend → Middleware → Backend spans' src='./f3s-kubernetes-with-freebsd-part-8/grafana-tempo-trace.png' /></a><br />
+<br />
+<span>For additional examples of Tempo trace visualization, see also:</span><br />
+<br />
+<a class='textlink' href='https://foo.zone/gemfeed/2025-12-24-x-rag-observability-hackathon.html'>X-RAG Observability Hackathon (more Grafana Tempo screenshots)</a><br />
+<br />
+<span>The trace reveals the distributed request flow:</span><br />
+<br />
+<ul>
+<li>Frontend (221ms): Receives GET /api/process, executes business logic, calls middleware</li>
+<li>Middleware (186ms): Receives POST /api/transform, transforms data, calls backend</li>
+<li>Backend (104ms): Receives GET /api/data, simulates database query with 100ms sleep</li>
+<li>Total request time: 221ms end-to-end</li>
+<li>Span propagation: W3C Trace Context headers automatically link all spans</li>
+</ul><br />
+<span>**6. Service graph visualization:**</span><br />
+<br />
+<span>The service graph is automatically generated from traces and shows service dependencies. For examples of service graph visualization in Grafana, see the screenshots in the X-RAG Observability Hackathon blog post.</span><br />
+<br />
+<a class='textlink' href='./2025-12-24-x-rag-observability-hackathon.html'>X-RAG Observability Hackathon (includes service graph screenshots)</a><br />
+<br />
+<span>This visualization helps identify:</span><br />
+<br />
+<ul>
+<li>Request rates between services</li>
+<li>Average latency for each hop</li>
+<li>Error rates (if any)</li>
+<li>Service dependencies and communication patterns</li>
+</ul><br />
+<h3 style='display: inline' id='storage-and-retention'>Storage and Retention</h3><br />
+<br />
+<span>Monitor Tempo storage usage:</span><br />
+<br />
+<pre>
+kubectl exec -n monitoring &lt;tempo-pod&gt; -- df -h /var/tempo
+</pre>
+<br />
+<span>With 10Gi storage and 7-day retention, the system handles moderate trace volumes. If storage fills up:</span><br />
+<br />
+<ul>
+<li>Reduce retention to 72h (3 days)</li>
+<li>Implement sampling in Alloy</li>
+<li>Increase PV size</li>
+</ul><br />
+<h3 style='display: inline' id='configuration-files'>Configuration Files</h3><br />
+<br />
+<span>All configuration files are available on Codeberg:</span><br />
+<br />
+<a class='textlink' href='https://codeberg.org/snonux/conf/src/branch/master/f3s/tempo'>Tempo configuration</a><br />
+<a class='textlink' href='https://codeberg.org/snonux/conf/src/branch/master/f3s/loki'>Alloy configuration (updated for traces)</a><br />
+<a class='textlink' href='https://codeberg.org/snonux/conf/src/branch/master/f3s/tracing-demo'>Demo tracing application</a><br />
+<br />
<h2 style='display: inline' id='summary'>Summary</h2><br />
<br />
-<span>With Prometheus, Grafana, Loki, and Alloy deployed, I now have complete visibility into the k3s cluster, the FreeBSD storage servers, and the OpenBSD edge relays:</span><br />
+<span>With Prometheus, Grafana, Loki, Alloy, and Tempo deployed, I now have complete visibility into the k3s cluster, the FreeBSD storage servers, and the OpenBSD edge relays:</span><br />
<br />
<ul>
-<li>metrics: Prometheus collects and stores time-series data from all components</li>
+<li>Metrics: Prometheus collects and stores time-series data from all components, including etcd and ZFS</li>
<li>Logs: Loki aggregates logs from all containers, searchable via Grafana</li>
-<li>Visualisation: Grafana provides dashboards and exploration tools</li>
+<li>Traces: Tempo provides distributed request tracing with service dependency mapping</li>
+<li>Visualisation: Grafana provides dashboards and exploration tools with correlation between all three signals</li>
<li>Alerting: Alertmanager can notify on conditions defined in Prometheus rules</li>
</ul><br />
<span>This observability stack runs entirely on the home lab infrastructure, with data persisted to the NFS share. It&#39;s lightweight enough for a three-node cluster but provides the same capabilities as production-grade setups.</span><br />
<br />
+<span>All configuration files are available on Codeberg:</span><br />
+<br />
+<a class='textlink' href='https://codeberg.org/snonux/conf/src/branch/master/f3s/prometheus'>Prometheus, Grafana, and recording rules configuration</a><br />
+<a class='textlink' href='https://codeberg.org/snonux/conf/src/branch/master/f3s/loki'>Loki and Alloy configuration</a><br />
+<a class='textlink' href='https://codeberg.org/snonux/conf/src/branch/master/f3s/tempo'>Tempo configuration</a><br />
+<a class='textlink' href='https://codeberg.org/snonux/conf/src/branch/master/f3s/tracing-demo'>Demo tracing application</a><br />
+<br />
<span>Other *BSD-related posts:</span><br />
<br />
<a class='textlink' href='./2025-12-07-f3s-kubernetes-with-freebsd-part-8.html'>2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability (You are currently reading this)</a><br />
diff --git a/gemfeed/f3s-kubernetes-with-freebsd-part-8/grafana-etcd-dashboard.png b/gemfeed/f3s-kubernetes-with-freebsd-part-8/grafana-etcd-dashboard.png
new file mode 100644
index 00000000..e1d3100b
--- /dev/null
+++ b/gemfeed/f3s-kubernetes-with-freebsd-part-8/grafana-etcd-dashboard.png
Binary files differ
diff --git a/gemfeed/f3s-kubernetes-with-freebsd-part-8/grafana-zfs-arc-stats.png b/gemfeed/f3s-kubernetes-with-freebsd-part-8/grafana-zfs-arc-stats.png
new file mode 100644
index 00000000..2609c477
--- /dev/null
+++ b/gemfeed/f3s-kubernetes-with-freebsd-part-8/grafana-zfs-arc-stats.png
Binary files differ
diff --git a/gemfeed/f3s-kubernetes-with-freebsd-part-8/grafana-zfs-dashboard.png b/gemfeed/f3s-kubernetes-with-freebsd-part-8/grafana-zfs-dashboard.png
new file mode 100644
index 00000000..7a427184
--- /dev/null
+++ b/gemfeed/f3s-kubernetes-with-freebsd-part-8/grafana-zfs-dashboard.png
Binary files differ
diff --git a/gemfeed/f3s-kubernetes-with-freebsd-part-8/grafana-zfs-datasets.png b/gemfeed/f3s-kubernetes-with-freebsd-part-8/grafana-zfs-datasets.png
new file mode 100644
index 00000000..47890a0c
--- /dev/null
+++ b/gemfeed/f3s-kubernetes-with-freebsd-part-8/grafana-zfs-datasets.png
Binary files differ