summaryrefslogtreecommitdiff
path: root/docs/operations/troubleshooting.md
diff options
context:
space:
mode:
authorPaul Buetow <paul@buetow.org>2026-02-14 13:54:54 +0200
committerPaul Buetow <paul@buetow.org>2026-02-14 13:54:54 +0200
commit3a6e01c1abd4a68810f1d85c9aa75293af47f579 (patch)
tree2e3c066392cf2a292e89c90f259d039ce0afcb9b /docs/operations/troubleshooting.md
parentf3ea9a7a1f466b6109271c76eb58189d2a799998 (diff)
docs: restructure documentation and move scripts to scripts/
- Add docs/ hierarchy: guides, backends, operations, reference, design - Slim root README; add documentation index and links to docs/ - Add missing docs: csv-format-flexibility, dns-resolution, dtail-metrics-example, magefile - Document Prometheus/VictoriaMetrics and ClickHouse backends - Move all helper shell scripts to scripts/; update Magefile and doc references - Add ASCII diagrams for watch mode (CSV watcher), auto mode, and ingestion paths - Add .gitignore Co-authored-by: Cursor <cursoragent@cursor.com>
Diffstat (limited to 'docs/operations/troubleshooting.md')
-rw-r--r--docs/operations/troubleshooting.md43
1 files changed, 43 insertions, 0 deletions
diff --git a/docs/operations/troubleshooting.md b/docs/operations/troubleshooting.md
new file mode 100644
index 0000000..9446508
--- /dev/null
+++ b/docs/operations/troubleshooting.md
@@ -0,0 +1,43 @@
+# Troubleshooting
+
+## Binary can't connect to Pushgateway
+
+- Confirm a port-forward or route to Pushgateway is running, e.g. `ps aux | grep "port-forward.*9091"`.
+- Restart port-forward: `kubectl port-forward -n monitoring svc/pushgateway 9091:9091`.
+- Ensure `-pushgateway` points at the URL you use (e.g. `http://localhost:9091`).
+
+## Metrics not appearing in Prometheus
+
+- **Pushgateway:** `curl http://localhost:9091/metrics | grep "prometheus_pusher_test"` (or your job/metric name). If empty, Epimetheus may not be pushing or the job name may differ.
+- **Scrape:** In Prometheus UI (e.g. http://localhost:9090/targets), check that the Pushgateway job exists and is up.
+- **Logs:** `kubectl logs -n monitoring -l app.kubernetes.io/name=prometheus` (or your Prometheus pod) for scrape/remote-write errors.
+
+## "Remote write receiver not enabled" error
+
+Prometheus must be started with the Remote Write receiver enabled. Verify:
+
+```bash
+kubectl logs -n monitoring prometheus-prometheus-kube-prometheus-prometheus-0 | grep "remote-write-receiver"
+```
+
+You should see the feature listed in the enabled features. If not, add `web.enable-remote-write-receiver` (see [Setup: Prometheus](setup-prometheus.md)) and restart Prometheus.
+
+## "Out of order sample" error
+
+You are writing a sample older than existing data for the same series.
+
+- Use different labels for historic data (e.g. `job="historic_data"`), or
+- Enable out-of-order ingestion on Prometheus and set `tsdb.outOfOrderTimeWindow` (see [Setup: Prometheus](setup-prometheus.md)), or
+- Run backfills from oldest to newest.
+
+## Dashboard not appearing in Grafana
+
+- Check the dashboard ConfigMap exists: `kubectl get configmap -n monitoring | grep epimetheus`.
+- Ensure the ConfigMap has the label Grafana uses for dashboard discovery (e.g. `grafana_dashboard: "1"`): `kubectl get configmap epimetheus-dashboard -n monitoring -o yaml | grep "grafana_dashboard"`.
+- Restart Grafana to reload dashboards: `kubectl rollout restart deployment/prometheus-grafana -n monitoring` (adjust deployment name to your setup).
+
+## ClickHouse connection failed
+
+- Ensure ClickHouse is listening on HTTP (default port 8123): `curl -sS http://localhost:8123/ping`.
+- If using Kubernetes, check Service and port-forwards. Use the same URL as `-clickhouse`.
+- See [Setup: ClickHouse](setup-clickhouse.md) and [ClickHouse backend](../backends/clickhouse.md).