diff options
Diffstat (limited to 'docs/operations')
| -rw-r--r-- | docs/operations/cleanup.md | 48 | ||||
| -rw-r--r-- | docs/operations/kubernetes.md | 51 | ||||
| -rw-r--r-- | docs/operations/macos-setup.md | 91 | ||||
| -rw-r--r-- | docs/operations/setup-clickhouse.md | 43 | ||||
| -rw-r--r-- | docs/operations/setup-prometheus.md | 82 | ||||
| -rw-r--r-- | docs/operations/troubleshooting.md | 43 |
6 files changed, 358 insertions, 0 deletions
diff --git a/docs/operations/cleanup.md b/docs/operations/cleanup.md new file mode 100644 index 0000000..7835b21 --- /dev/null +++ b/docs/operations/cleanup.md @@ -0,0 +1,48 @@ +# Cleanup + +## Benchmark data in Prometheus + +To remove benchmark metrics from Prometheus, use the provided script: + +```bash +# Port-forward to Prometheus if needed +kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090:9090 & + +./scripts/cleanup-benchmark-data.sh +``` + +The script deletes all `epimetheus_benchmark_*` series via the Admin API and runs clean_tombstones. + +**Manual deletion:** + +```bash +# Delete a specific metric +curl -X POST 'http://localhost:9090/api/v1/admin/tsdb/delete_series?match[]=epimetheus_benchmark_cpu_usage' + +# Clean tombstones +curl -X POST http://localhost:9090/api/v1/admin/tsdb/clean_tombstones +``` + +The Admin API must be enabled on Prometheus (see [Setup: Prometheus](setup-prometheus.md)). + +## Other cleanup + +**Stop port-forwards:** + +```bash +pkill -f "port-forward.*9091" +pkill -f "port-forward.*9090" +pkill -f "port-forward.*3000" +``` + +**Remove test metrics from Pushgateway:** + +```bash +curl -X DELETE http://localhost:9091/metrics/job/example_metrics_pusher +``` + +**Uninstall Pushgateway (Helm):** + +```bash +helm uninstall pushgateway -n monitoring +``` diff --git a/docs/operations/kubernetes.md b/docs/operations/kubernetes.md new file mode 100644 index 0000000..20b8b07 --- /dev/null +++ b/docs/operations/kubernetes.md @@ -0,0 +1,51 @@ +# Kubernetes + +Common tasks when running Epimetheus against Prometheus, Pushgateway, and Grafana in Kubernetes. + +## Port-forwards + +To run Epimetheus on your laptop against cluster services: + +```bash +# Pushgateway (realtime mode) +kubectl port-forward -n monitoring svc/pushgateway 9091:9091 & + +# Prometheus (historic/watch, queries) +kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090:9090 & + +# Grafana (dashboards) +kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80 +``` + +Then use `http://localhost:9091`, `http://localhost:9090`, and `http://localhost:3000` in Epimetheus flags and in the browser. Adjust service names and namespaces to match your cluster (e.g. `prometheus-kube-prometheus-prometheus` for kube-prometheus-stack). + +## Deploying Pushgateway + +Example using the official Helm chart: + +```bash +helm repo add prometheus-community https://prometheus-community.github.io/helm-charts +helm install pushgateway prometheus-community/prometheus-pushgateway -n monitoring --create-namespace +``` + +Alternatively use your own chart (e.g. from the [conf repository](https://codeberg.org/snonux/conf) at `f3s/pushgateway/helm-chart`). + +## Deploying the Epimetheus Grafana dashboard + +**ConfigMap (recommended):** If you have a manifest that creates a ConfigMap with the dashboard JSON and the Grafana label for auto-discovery: + +```bash +kubectl apply -f ../prometheus/epimetheus-dashboard.yaml +``` + +**Script:** From the repo, with Grafana reachable (e.g. after port-forward): + +```bash +./scripts/deploy-dashboard.sh +# Or with credentials: +GRAFANA_URL="http://localhost:3000" GRAFANA_USER="admin" GRAFANA_PASSWORD="yourpassword" ./scripts/deploy-dashboard.sh +``` + +## Namespace and service names + +Replace `monitoring` and the Prometheus/Pushgateway/Grafana service names with whatever your Helm release or manifests use. Epimetheus only needs the URLs; it does not need to run inside the cluster. diff --git a/docs/operations/macos-setup.md b/docs/operations/macos-setup.md new file mode 100644 index 0000000..8ed47c9 --- /dev/null +++ b/docs/operations/macos-setup.md @@ -0,0 +1,91 @@ +# macOS Setup + +## Basic installation + +```bash +brew install prometheus +brew install grafana +go install github.com/prometheus/pushgateway@latest +brew services start grafana +brew services start prometheus +~/go/bin/pushgateway & +``` + +Log in to Grafana at http://localhost:3000 (default admin:admin; you will be prompted to change the password). Add http://localhost:9090 as a Prometheus datasource. + +## Enable Remote Write receiver (required for watch/historic/backfill/auto) + +Watch mode, historic mode, backfill mode, and auto mode with old data require the Prometheus Remote Write receiver. + +### Option 1: Permanent configuration + +Edit the Prometheus arguments file (Homebrew example): + +```bash +nano /opt/homebrew/etc/prometheus.args +``` + +Add at the end: + +``` +--web.enable-remote-write-receiver +--web.enable-admin-api +``` + +Example full file: + +``` +--config.file /opt/homebrew/etc/prometheus.yml +--web.listen-address=127.0.0.1:9090 +--storage.tsdb.path /opt/homebrew/var/prometheus +--web.enable-remote-write-receiver +--web.enable-admin-api +``` + +Restart Prometheus: + +```bash +brew services restart prometheus +``` + +Verify: + +```bash +curl http://localhost:9090/-/healthy +curl -X POST http://localhost:9090/api/v1/write # expect 400, not 404 +``` + +### Option 2: Temporary (testing only) + +```bash +brew services stop prometheus +prometheus --web.enable-remote-write-receiver +``` + +Keep that terminal open; use another for Epimetheus. This stops when you close the terminal. + +## Clearing old metrics (optional) + +If the Admin API is enabled: + +```bash +# Delete metrics by name pattern +curl -X POST -g 'http://localhost:9090/api/v1/admin/tsdb/delete_series?match[]={__name__=~"blockstore_.*"}' +curl -X POST http://localhost:9090/api/v1/admin/tsdb/clean_tombstones +sleep 2 +``` + +## Verify watch mode + +```bash +cat > /tmp/test.csv << EOF +status,count,method +200,100,GET +404,50,POST +EOF + +./epimetheus -mode=watch -file=/tmp/test.csv -metric-name=test \ + -prometheus=http://localhost:9090/api/v1/write +``` + +You should see a success message. In Prometheus (http://localhost:9090), query `{__name__=~"test_.*"}`. diff --git a/docs/operations/setup-clickhouse.md b/docs/operations/setup-clickhouse.md new file mode 100644 index 0000000..acc8247 --- /dev/null +++ b/docs/operations/setup-clickhouse.md @@ -0,0 +1,43 @@ +# Setup: ClickHouse + +ClickHouse is only used in **watch mode**. Epimetheus creates the metrics table automatically if it does not exist. + +## Running ClickHouse + +- **Linux (systemd):** `sudo systemctl start clickhouse-server` +- **Docker:** Use the official [ClickHouse image](https://hub.docker.com/r/clickhouse/clickhouse-server) and expose the HTTP interface (default port 8123). +- **Kubernetes:** Deploy ClickHouse and expose a Service; use the HTTP URL (e.g. `http://clickhouse.monitoring.svc.cluster.local:8123`) as `-clickhouse`. + +Default HTTP port is **8123**. Epimetheus uses the HTTP interface, not the native protocol. + +## Table Creation + +You do not need to create the table manually. On first ingest, Epimetheus runs: + +```sql +CREATE TABLE IF NOT EXISTS epimetheus_metrics ( + metric String, + labels Map(String, String), + value Float64, + timestamp DateTime64(3) +) ENGINE = MergeTree() +ORDER BY (metric, timestamp) +``` + +To use a different table name, set `-clickhouse-table`. + +## Verification + +After running watch mode with `-clickhouse` set, verify ingestion: + +```bash +./scripts/verify-clickhouse.sh +``` + +With custom URL or table: + +```bash +./scripts/verify-clickhouse.sh http://localhost:8123 epimetheus_metrics +``` + +The script checks connectivity (`/ping`), row count, distinct metrics, sample rows, and rows per metric. If the table is empty or missing, it prints a reminder command to run Epimetheus in watch mode with `-clickhouse`. See [ClickHouse backend](../backends/clickhouse.md) for usage. diff --git a/docs/operations/setup-prometheus.md b/docs/operations/setup-prometheus.md new file mode 100644 index 0000000..294ce20 --- /dev/null +++ b/docs/operations/setup-prometheus.md @@ -0,0 +1,82 @@ +# Setup: Prometheus + +To use historic mode, backfill mode, auto mode with old data, or watch mode with `-prometheus`, you must enable the Prometheus Remote Write receiver. Without it, Epimetheus can only push realtime data via Pushgateway. + +## 1. Enable Remote Write Receiver and Admin API + +Example configuration (Prometheus 3.x style). Adjust paths and stack to match your environment (e.g. [conf repository](https://codeberg.org/snonux/conf) at `f3s/prometheus/persistence-values.yaml`): + +```yaml +prometheus: + prometheusSpec: + additionalArgs: + - name: web.enable-remote-write-receiver + value: "" + - name: web.enable-admin-api + value: "" + + enableFeatures: + - exemplar-storage + - otlp-write-receiver + + tsdb: + outOfOrderTimeWindow: 744h # 31 days for backfilling +``` + +This provides: + +- **Remote Write API** at `/api/v1/write` for ingesting metrics with custom timestamps. +- **Admin API** at `/api/v1/admin/tsdb/*` for deleting series and cleaning tombstones. +- **Out-of-order ingestion** so older points can be written for existing series (within the time window). + +After changing config, upgrade Prometheus (e.g. `helm upgrade` or your usual apply). + +### Verify + +```bash +# Remote Write receiver +kubectl get pod -n monitoring prometheus-prometheus-kube-prometheus-prometheus-0 \ + -o jsonpath='{.spec.containers[0].args}' | grep -o "web.enable-remote-write-receiver" + +# Out-of-order window +kubectl get prometheus -n monitoring prometheus-kube-prometheus-prometheus \ + -o jsonpath='{.spec.tsdb.outOfOrderTimeWindow}' + +# Admin API +kubectl get pod -n monitoring prometheus-prometheus-kube-prometheus-prometheus-0 \ + -o jsonpath='{.spec.containers[0].args}' | grep -o "web.enable-admin-api" +``` + +**Note:** In Prometheus 3.x use `additionalArgs` for `web.enable-remote-write-receiver`; the older `enableFeatures: [remote-write-receiver]` is deprecated. + +## 2. Scrape Config for Pushgateway + +For realtime mode, Prometheus must scrape Pushgateway. Example: + +```yaml +# additional-scrape-configs.yaml +- job_name: 'pushgateway' + honor_labels: true + static_configs: + - targets: + - 'pushgateway.monitoring.svc.cluster.local:9091' +``` + +Apply as a Secret (example): + +```bash +kubectl create secret generic additional-scrape-configs \ + --from-file=additional-scrape-configs.yaml \ + --dry-run=client -o yaml -n monitoring | kubectl apply -f - +``` + +## 3. Retention + +Check retention so you know how far back Epimetheus can write: + +```bash +kubectl get prometheus -n monitoring prometheus-kube-prometheus-prometheus \ + -o jsonpath='{.spec.retention}' +``` + +For very old data, increase retention or use a dedicated dev/test Prometheus. Enabling out-of-order ingestion and a large `outOfOrderTimeWindow` has memory and I/O trade-offs; see [Prometheus backend](../backends/prometheus.md) and keep production config conservative. diff --git a/docs/operations/troubleshooting.md b/docs/operations/troubleshooting.md new file mode 100644 index 0000000..9446508 --- /dev/null +++ b/docs/operations/troubleshooting.md @@ -0,0 +1,43 @@ +# Troubleshooting + +## Binary can't connect to Pushgateway + +- Confirm a port-forward or route to Pushgateway is running, e.g. `ps aux | grep "port-forward.*9091"`. +- Restart port-forward: `kubectl port-forward -n monitoring svc/pushgateway 9091:9091`. +- Ensure `-pushgateway` points at the URL you use (e.g. `http://localhost:9091`). + +## Metrics not appearing in Prometheus + +- **Pushgateway:** `curl http://localhost:9091/metrics | grep "prometheus_pusher_test"` (or your job/metric name). If empty, Epimetheus may not be pushing or the job name may differ. +- **Scrape:** In Prometheus UI (e.g. http://localhost:9090/targets), check that the Pushgateway job exists and is up. +- **Logs:** `kubectl logs -n monitoring -l app.kubernetes.io/name=prometheus` (or your Prometheus pod) for scrape/remote-write errors. + +## "Remote write receiver not enabled" error + +Prometheus must be started with the Remote Write receiver enabled. Verify: + +```bash +kubectl logs -n monitoring prometheus-prometheus-kube-prometheus-prometheus-0 | grep "remote-write-receiver" +``` + +You should see the feature listed in the enabled features. If not, add `web.enable-remote-write-receiver` (see [Setup: Prometheus](setup-prometheus.md)) and restart Prometheus. + +## "Out of order sample" error + +You are writing a sample older than existing data for the same series. + +- Use different labels for historic data (e.g. `job="historic_data"`), or +- Enable out-of-order ingestion on Prometheus and set `tsdb.outOfOrderTimeWindow` (see [Setup: Prometheus](setup-prometheus.md)), or +- Run backfills from oldest to newest. + +## Dashboard not appearing in Grafana + +- Check the dashboard ConfigMap exists: `kubectl get configmap -n monitoring | grep epimetheus`. +- Ensure the ConfigMap has the label Grafana uses for dashboard discovery (e.g. `grafana_dashboard: "1"`): `kubectl get configmap epimetheus-dashboard -n monitoring -o yaml | grep "grafana_dashboard"`. +- Restart Grafana to reload dashboards: `kubectl rollout restart deployment/prometheus-grafana -n monitoring` (adjust deployment name to your setup). + +## ClickHouse connection failed + +- Ensure ClickHouse is listening on HTTP (default port 8123): `curl -sS http://localhost:8123/ping`. +- If using Kubernetes, check Service and port-forwards. Use the same URL as `-clickhouse`. +- See [Setup: ClickHouse](setup-clickhouse.md) and [ClickHouse backend](../backends/clickhouse.md). |
