diff options
Diffstat (limited to 'docs/operations/troubleshooting.md')
| -rw-r--r-- | docs/operations/troubleshooting.md | 43 |
1 files changed, 43 insertions, 0 deletions
diff --git a/docs/operations/troubleshooting.md b/docs/operations/troubleshooting.md new file mode 100644 index 0000000..9446508 --- /dev/null +++ b/docs/operations/troubleshooting.md @@ -0,0 +1,43 @@ +# Troubleshooting + +## Binary can't connect to Pushgateway + +- Confirm a port-forward or route to Pushgateway is running, e.g. `ps aux | grep "port-forward.*9091"`. +- Restart port-forward: `kubectl port-forward -n monitoring svc/pushgateway 9091:9091`. +- Ensure `-pushgateway` points at the URL you use (e.g. `http://localhost:9091`). + +## Metrics not appearing in Prometheus + +- **Pushgateway:** `curl http://localhost:9091/metrics | grep "prometheus_pusher_test"` (or your job/metric name). If empty, Epimetheus may not be pushing or the job name may differ. +- **Scrape:** In Prometheus UI (e.g. http://localhost:9090/targets), check that the Pushgateway job exists and is up. +- **Logs:** `kubectl logs -n monitoring -l app.kubernetes.io/name=prometheus` (or your Prometheus pod) for scrape/remote-write errors. + +## "Remote write receiver not enabled" error + +Prometheus must be started with the Remote Write receiver enabled. Verify: + +```bash +kubectl logs -n monitoring prometheus-prometheus-kube-prometheus-prometheus-0 | grep "remote-write-receiver" +``` + +You should see the feature listed in the enabled features. If not, add `web.enable-remote-write-receiver` (see [Setup: Prometheus](setup-prometheus.md)) and restart Prometheus. + +## "Out of order sample" error + +You are writing a sample older than existing data for the same series. + +- Use different labels for historic data (e.g. `job="historic_data"`), or +- Enable out-of-order ingestion on Prometheus and set `tsdb.outOfOrderTimeWindow` (see [Setup: Prometheus](setup-prometheus.md)), or +- Run backfills from oldest to newest. + +## Dashboard not appearing in Grafana + +- Check the dashboard ConfigMap exists: `kubectl get configmap -n monitoring | grep epimetheus`. +- Ensure the ConfigMap has the label Grafana uses for dashboard discovery (e.g. `grafana_dashboard: "1"`): `kubectl get configmap epimetheus-dashboard -n monitoring -o yaml | grep "grafana_dashboard"`. +- Restart Grafana to reload dashboards: `kubectl rollout restart deployment/prometheus-grafana -n monitoring` (adjust deployment name to your setup). + +## ClickHouse connection failed + +- Ensure ClickHouse is listening on HTTP (default port 8123): `curl -sS http://localhost:8123/ping`. +- If using Kubernetes, check Service and port-forwards. Use the same URL as `-clickhouse`. +- See [Setup: ClickHouse](setup-clickhouse.md) and [ClickHouse backend](../backends/clickhouse.md). |
