From 3fd46f3977fb650974e5e936cba362c787c00637 Mon Sep 17 00:00:00 2001 From: Paul Buetow Date: Sat, 7 Feb 2026 16:32:10 +0200 Subject: reimport this PoC --- README.md | 1000 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1000 insertions(+) create mode 100644 README.md (limited to 'README.md') diff --git a/README.md b/README.md new file mode 100644 index 0000000..ba10a76 --- /dev/null +++ b/README.md @@ -0,0 +1,1000 @@ +
+ Epimetheus Logo +
+ +# Epimetheus + +A versatile Go tool for pushing metrics to Prometheus with support for both realtime and historic data ingestion. + +## Why "Epimetheus"? + +In Greek mythology, [Epimetheus](https://en.wikipedia.org/wiki/Epimetheus_(mythology)) is Prometheus's brother, whose name means "afterthought" or "hindsight" (while Prometheus means "forethought"). This name cleverly captures the tool's purpose: bringing data to Prometheus **after** collection, whether it's historic data from hours, days, or weeks ago, or realtime data pushed on-demand. + +While Epimetheus is sometimes depicted as foolish in myths (he accepted Pandora's box despite warnings), this tool embraces the "afterthought" aspect productively - it's never too late to bring your metrics home to Prometheus! + +## Architecture + +``` +┌─────────────────────────────────────────────────────────────────────────┐ +│ Epimetheus │ +│ (Metrics Ingestion Tool) │ +│ │ +│ Modes: │ +│ • Realtime - Current metrics (< 5 min old) │ +│ • Historic - Historic metrics (≥ 5 min old) │ +│ • Backfill - Range of historic data │ +│ • Auto - Automatic routing based on timestamp age │ +└─────────────────────────────────────────────────────────────────────────┘ + │ │ + │ Realtime Data │ Historic Data + │ (via HTTP POST) │ (via Remote Write API) + │ Uses "now" timestamp │ Preserves timestamps + ▼ ▼ +┌─────────────────────┐ ┌─────────────────────┐ +│ Pushgateway │ │ Prometheus │ +│ (Port 9091) │ │ (Port 9090) │ +│ │ │ │ +│ • Buffers metrics │ │ Remote Write API: │ +│ • Scraped by │──── Scraped ─────▶ │ /api/v1/write │ +│ Prometheus │ every 15-30s │ │ +│ • No timestamp │ │ Feature Required: │ +│ preservation │ │ --enable-feature= │ +│ │ │ remote-write- │ +│ │ │ receiver │ +└─────────────────────┘ └─────────────────────┘ + │ + │ Prometheus Query API + │ /api/v1/query + ▼ + ┌─────────────────────┐ + │ Grafana │ + │ (Port 3000) │ + │ │ + │ • Prometheus as │ + │ datasource │ + │ • Dashboards: │ + │ - Epimetheus │ + │ Test Metrics │ + │ • Auto-refresh │ + └─────────────────────┘ +``` + +### Data Flow + +1. **Realtime Path** (for current data): + - Epimetheus → Pushgateway (HTTP POST) + - Prometheus scrapes Pushgateway periodically + - Timestamp = "now" when Prometheus scrapes + +2. **Historic Path** (for old data): + - Epimetheus → Prometheus Remote Write API (HTTP POST) + - Direct write to Prometheus TSDB + - Timestamp preserved from original data + +3. **Visualization**: + - Grafana queries Prometheus + - Displays metrics in dashboards + - Auto-refresh every 10 seconds + +## Overview + +**epimetheus** is a standalone binary that: +- **Generates** realistic example metrics simulating production applications +- **Pushes** metrics via Pushgateway (realtime) or Remote Write API (historic) +- **Automatically detects** timestamp age and chooses the optimal ingestion method +- **Supports** multiple data formats (CSV, JSON) and all Prometheus metric types +- **Provides** Grafana dashboard for visualizing test metrics + +## Quick Start + +### 1. Deploy Pushgateway (one-time setup) + +The Pushgateway Helm chart is available in the [conf repository](https://codeberg.org/snonux/conf) at `f3s/pushgateway/helm-chart`. + +```bash +# Clone the conf repository if you haven't already +git clone https://codeberg.org/snonux/conf.git +cd conf/f3s/pushgateway/helm-chart + +# Deploy Pushgateway +helm upgrade --install pushgateway . -n monitoring --create-namespace +``` + +Alternatively, deploy Pushgateway using the official chart: + +```bash +helm repo add prometheus-community https://prometheus-community.github.io/helm-charts +helm install pushgateway prometheus-community/prometheus-pushgateway -n monitoring --create-namespace +``` + +### 2. Run in Realtime Mode + +```bash +# Port-forward Pushgateway +kubectl port-forward -n monitoring svc/pushgateway 9091:9091 & + +# Push test metrics continuously +cd /home/paul/git/conf/f3s/epimetheus +./epimetheus -mode=realtime -continuous +``` + +The binary pushes metrics every 15 seconds. Press Ctrl+C to stop. + +### 3. View Metrics + +```bash +# Pushgateway UI +open http://localhost:9091 + +# Prometheus UI +kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090:9090 & +open http://localhost:9090 +``` + +## Operating Modes + +### 👁️ Watch Mode +Monitor CSV files for changes and push metrics to Prometheus with file modification timestamps. + +**Works with ANY CSV format** - automatically detects numeric vs string columns and sanitizes names. + +**NEW: Automatic DNS Resolution** - IP addresses are automatically resolved to hostnames for better observability in Grafana. + +```bash +./epimetheus -mode=watch \ + -file=mydata.csv \ + -metric-name=myapp \ + -prometheus=http://localhost:9090/api/v1/write +``` + +**Features:** +- 🔍 **Format-agnostic**: Works with any tabular CSV structure +- 📊 **Automatic detection**: Numeric columns → metrics, String columns → labels +- 🏷️ **Name sanitization**: `min(potatoes)`, `avg(time)`, `p99(latency)` → valid metric names +- 🌐 **DNS Resolution**: IP addresses → hostnames (e.g., `10.50.52.61` → `foo.example.lan`) +- 💾 **Smart Caching**: In-memory cache prevents redundant DNS lookups +- ⏱️ **Timestamp preservation**: Uses file modification time +- 🔄 **Continuous monitoring**: Polls file every 1 second +- 💪 **Error resilient**: Continues watching despite failures +- 🎯 **Remote Write**: Pushes to Prometheus (preserves timestamps) + +**CSV Format:** +Works with any tabular CSV: +- First row: column headers (automatically sanitized) +- Subsequent rows: data values +- Column names can be anything: `min(x)`, `avg(y)`, `p99(latency)`, etc. + +**Example 1** - Web metrics: +```csv +avg(response_time),p99(latency),endpoint,method +45.2,120.5,/api/users,GET +52.1,135.8,/api/orders,POST +``` + +Generates: +```promql +web_avg_response_time{endpoint="/api/users",method="GET"} 45.2 +web_p99_latency{endpoint="/api/users",method="GET"} 120.5 +web_avg_response_time{endpoint="/api/orders",method="POST"} 52.1 +web_p99_latency{endpoint="/api/orders",method="POST"} 135.8 +``` + +**Example 2** - Food metrics: +```csv +min(potatoes),last(coke),avg(price),country,store_type +5.2,10.5,12.99,USA,grocery +3.8,8.2,9.99,Canada,convenience +``` + +Generates: +```promql +food_min_potatoes{country="USA",store_type="grocery"} 5.2 +food_last_coke{country="USA",store_type="grocery"} 10.5 +food_avg_price{country="USA",store_type="grocery"} 12.99 +# ... etc +``` + +Each row generates N samples (N = number of numeric columns). + +See [CSV-FORMAT-FLEXIBILITY.md](CSV-FORMAT-FLEXIBILITY.md) for more examples. + +**Options:** +- `-file` - CSV file to watch (required) +- `-metric-name` - Base metric name (required, e.g., `food`, `network`, `database`) +- `-prometheus` - Prometheus Remote Write URL (default: http://localhost:9090/api/v1/write) +- `-clickhouse` - ClickHouse HTTP URL (e.g. http://localhost:8123) to also ingest metrics +- `-clickhouse-table` - ClickHouse table name (default: epimetheus_metrics) +- `-job` - Job name for metrics (default: example_metrics_pusher) +- `-resolve-ip-labels` - Additional IP labels to resolve via DNS (default: ip is always resolved) + +**ClickHouse Support:** +Watch mode can ingest to ClickHouse in addition to (or instead of) Prometheus: + +```bash +# Ingest to both Prometheus and ClickHouse +./epimetheus -mode=watch -file=data.csv -metric-name=myapp \ + -prometheus=http://localhost:9090/api/v1/write \ + -clickhouse=http://localhost:8123 + +# ClickHouse only (use -prometheus= to disable Prometheus) +./epimetheus -mode=watch -file=test-data/watch-clickhouse-test.csv \ + -metric-name=watch_test -clickhouse=http://localhost:8123 -prometheus= + +# Verify data in ClickHouse +./verify-clickhouse.sh +``` + +**DNS Resolution:** +By default, the `ip` label is automatically resolved to a hostname. To resolve additional IP labels: + +```bash +./epimetheus -mode=watch \ + -file=network.csv \ + -metric-name=network \ + -resolve-ip-labels=source_ip,dest_ip +``` + +This will resolve: `ip` (default) + `source_ip` + `dest_ip` + +**Example:** +- Input: `ip="10.50.52.61"` +- Output: `ip="foo.example.lan"` +- Failed lookups: IP remains unchanged + +**Documentation:** +- [DNS-RESOLUTION-FEATURE.md](DNS-RESOLUTION-FEATURE.md) - Complete DNS resolution guide +- [CSV-FORMAT-FLEXIBILITY.md](CSV-FORMAT-FLEXIBILITY.md) - Works with ANY CSV format +- [DTAIL-METRICS-EXAMPLE.md](DTAIL-METRICS-EXAMPLE.md) - Detailed dtail.csv example + +### 🔄 Realtime Mode (Default) +Push current metrics to Pushgateway with "now" timestamp. + +```bash +./epimetheus -mode=realtime -continuous +``` + +**Options:** +- `-pushgateway` - Pushgateway URL (default: http://localhost:9091) +- `-job` - Job name (default: example_metrics_pusher) +- `-continuous` - Keep pushing every 15 seconds + +### ⏰ Historic Mode +Push a single datapoint from the past using Remote Write API. + +```bash +# Port-forward Prometheus +kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090:9090 & + +# Push data from 24 hours ago +./epimetheus -mode=historic -hours-ago=24 +``` + +**Options:** +- `-prometheus` - Prometheus URL (default: http://localhost:9090/api/v1/write) +- `-hours-ago` - Hours in the past (default: 24) + +### 📦 Backfill Mode +Import a range of historic data points. + +```bash +# Backfill last 48 hours with 1-hour intervals +./epimetheus -mode=backfill -start-hours=48 -end-hours=0 -interval=1 + +# Backfill last week with 6-hour intervals +./epimetheus -mode=backfill -start-hours=168 -end-hours=0 -interval=6 +``` + +**Options:** +- `-start-hours` - Start time in hours ago +- `-end-hours` - End time in hours ago (0 = now) +- `-interval` - Interval between points in hours + +### 🤖 Auto Mode (Recommended!) +Automatically detect timestamp age and route to the correct ingestion method. + +```bash +# Generate test data +./generate-test-data.sh + +# Import mixed current and historic data +./epimetheus -mode=auto -file=test-all-ages.csv +``` + +**Detection Logic:** +- Data < 5 minutes old → Pushgateway (realtime) +- Data ≥ 5 minutes old → Remote Write (historic) + +**Options:** +- `-file` - Input file path +- `-format` - Data format: csv or json (default: csv) +- `-pushgateway` - Pushgateway URL +- `-prometheus` - Prometheus Remote Write URL + +## Data Formats + +### CSV Format + +```csv +# Format: metric_name,labels,value,timestamp_ms +# Labels: key1=value1;key2=value2 +epimetheus_test_requests_total,instance=web1;env=prod,100,1767125148000 +epimetheus_test_temperature_celsius,instance=web2,22.5,1767038748000 + +# Timestamp is optional (uses "now" if omitted) +epimetheus_test_active_connections,instance=web3,42, +``` + +### JSON Format + +```json +[ + { + "metric": "epimetheus_test_requests_total", + "labels": {"instance": "web1", "env": "prod"}, + "value": 100, + "timestamp_ms": 1767125148000 + }, + { + "metric": "epimetheus_test_temperature_celsius", + "labels": {"instance": "web2"}, + "value": 22.5, + "timestamp_ms": 1767038748000 + } +] +``` + +## Test Metrics + +All generated metrics use the `epimetheus_test_` prefix to clearly identify them as test data. + +### Counter: `epimetheus_test_requests_total` +- **Type:** Counter (monotonically increasing) +- **Description:** Total number of requests processed +- **Use case:** Counting total events, requests, errors + +### Gauge: `epimetheus_test_active_connections` +- **Type:** Gauge (can increase or decrease) +- **Description:** Current number of active connections (0-100) +- **Use case:** Current state measurements, capacity + +### Gauge: `epimetheus_test_temperature_celsius` +- **Type:** Gauge +- **Description:** Current temperature in Celsius (0-50°C) +- **Use case:** Environmental monitoring + +### Histogram: `epimetheus_test_request_duration_seconds` +- **Type:** Histogram (distribution) +- **Description:** Request duration distribution +- **Buckets:** 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10 seconds +- **Use case:** Latency measurements, SLO tracking + +### Labeled Counter: `epimetheus_test_jobs_processed_total` +- **Type:** Counter with labels +- **Description:** Jobs processed by type and status +- **Labels:** + - `job_type`: email, report, backup + - `status`: success, failed +- **Use case:** Categorized counting, multi-dimensional metrics + +## Grafana Dashboard + +A comprehensive dashboard is available showcasing all test metrics. + +### Dashboard Features + +- **8 Panels:** + 1. Request Rate (line graph) + 2. Total Requests (stat panel) + 3. Active Connections (gauge with thresholds) + 4. Temperature (gauge with thresholds) + 5. Request Duration Histogram (p50, p90, p99) + 6. Average Request Duration (stat) + 7. Jobs Processed by Type (bar gauge) + 8. Jobs Status Breakdown (table) + +- **Auto-refresh:** Every 10 seconds +- **Time range:** Last 15 minutes (customizable) +- **Dark theme optimized** + +### Deploy Dashboard + +#### Option 1: Helm/Kubernetes ConfigMap (Recommended) + +```bash +# Deploy via Kubernetes ConfigMap +kubectl apply -f ../prometheus/epimetheus-dashboard.yaml +``` + +The dashboard will be automatically discovered by Grafana. + +#### Option 2: Manual Import + +```bash +# Port-forward Grafana +kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80 + +# Open Grafana +open http://localhost:3000 + +# Go to Dashboards → Import → Upload grafana-dashboard.json +``` + +#### Option 3: Automated Script + +```bash +# Deploy via API +./deploy-dashboard.sh + +# Or with custom credentials +GRAFANA_URL="http://localhost:3000" \ +GRAFANA_USER="admin" \ +GRAFANA_PASSWORD="yourpassword" \ +./deploy-dashboard.sh +``` + +## Example Queries + +### Basic Queries + +```promql +# View total requests +epimetheus_test_requests_total + +# View request rate over last 5 minutes +rate(epimetheus_test_requests_total[5m]) + +# View current active connections +epimetheus_test_active_connections + +# View current temperature +epimetheus_test_temperature_celsius +``` + +### Histogram Queries + +```promql +# 95th percentile request duration +histogram_quantile(0.95, rate(epimetheus_test_request_duration_seconds_bucket[5m])) + +# 50th percentile (median) +histogram_quantile(0.50, rate(epimetheus_test_request_duration_seconds_bucket[5m])) + +# Average request duration +rate(epimetheus_test_request_duration_seconds_sum[5m]) / +rate(epimetheus_test_request_duration_seconds_count[5m]) +``` + +### Labeled Counter Queries + +```promql +# Failed jobs by type +epimetheus_test_jobs_processed_total{status="failed"} + +# Job success rate +rate(epimetheus_test_jobs_processed_total{status="success"}[5m]) / +rate(epimetheus_test_jobs_processed_total[5m]) + +# Total jobs by type +sum by (job_type) (epimetheus_test_jobs_processed_total) +``` + +### Curl Examples + +```bash +# Port-forward Prometheus +kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090:9090 & + +# Query total requests +curl -s "http://localhost:9090/api/v1/query?query=epimetheus_test_requests_total" | jq . + +# Query temperature +curl -s "http://localhost:9090/api/v1/query?query=epimetheus_test_temperature_celsius" | jq . + +# Query request rate +curl -s "http://localhost:9090/api/v1/query?query=rate(epimetheus_test_requests_total[5m])" | jq . + +# Query histogram p95 +curl -s "http://localhost:9090/api/v1/query?query=histogram_quantile(0.95,rate(epimetheus_test_request_duration_seconds_bucket[5m]))" | jq . +``` + +## Time Range Limitations + +### ✅ Supported Time Ranges + +| Time Range | Status | Method | +|------------|--------|--------| +| Current (< 5 min) | ✅ Works | Pushgateway | +| 1 hour old | ✅ Works | Remote Write | +| 1 day old | ✅ Works | Remote Write | +| 1 week old | ✅ Works | Remote Write | +| 1 month old | ✅ Works | Remote Write | + +### ⚠️ Potential Issues + +- **Future timestamps:** Rejected (> 5 minutes in future) +- **Very old data (6+ months):** May be rejected depending on Prometheus retention +- **Years old:** Likely rejected - use `promtool tsdb create-blocks-from` instead +- **Out-of-order samples:** Can't insert older data into existing time series (use different labels) + +### Prometheus Configuration + +Check your retention settings: + +```bash +# View retention +kubectl get prometheus -n monitoring prometheus-kube-prometheus-prometheus \ + -o jsonpath='{.spec.retention}' + +# Default is typically 15 days +``` + +For very old data: +- Increase retention in Prometheus config +- Enable out-of-order ingestion (experimental) +- Use `promtool` for direct TSDB block creation + +## Project Structure + +``` +epimetheus/ +├── cmd/ +│ └── epimetheus/ +│ └── main.go # Main entry point +├── internal/ +│ ├── config/ # Configuration +│ ├── metrics/ # Metric generators +│ ├── parser/ # CSV/JSON parsers (includes tabular CSV) +│ ├── ingester/ # Pushgateway & Remote Write ingesters +│ └── watcher/ # File watcher for watch mode +├── epimetheus # Compiled binary +├── grafana-dashboard.json # Grafana dashboard definition +├── deploy-dashboard.sh # Dashboard deployment script +├── generate-test-data.sh # Test data generator +├── run.sh # Helper script +└── README.md # This file +``` + +## Setup Requirements + +### 1. Enable Prometheus Remote Write Receiver ⚠️ **REQUIRED for Historic Data** + +**IMPORTANT**: To use historic mode, backfill mode, or auto mode with old data, you **must** enable the Prometheus Remote Write receiver. Without this feature, Epimetheus can only push realtime data via Pushgateway. + +The Remote Write receiver is configured in the [conf repository](https://codeberg.org/snonux/conf) at `f3s/prometheus/persistence-values.yaml`: + +```yaml +# In prometheus/persistence-values.yaml (from conf repository) +prometheus: + prometheusSpec: + # Enable Remote Write receiver endpoint and Admin API (Prometheus 3.x syntax) + additionalArgs: + - name: web.enable-remote-write-receiver + value: "" + - name: web.enable-admin-api + value: "" + + # Enable out-of-order ingestion for backfilling + # Allows writing data points older than existing data for the same time series + enableFeatures: + - exemplar-storage + - otlp-write-receiver + + # Allow backfilling up to 31 days in the past (provides 1-day buffer for 30-day datasets) + tsdb: + outOfOrderTimeWindow: 744h # 31 days +``` + +**What This Enables:** +- **Remote Write API**: HTTP endpoint at `/api/v1/write` for ingesting metrics with custom timestamps +- **Admin API**: HTTP endpoints at `/api/v1/admin/tsdb/*` for data deletion and management +- **Out-of-Order Ingestion**: Allows writing data points older than existing data for the same time series +- **31-Day Window**: Can backfill data up to 31 days in the past (provides 1-day buffer for 30-day datasets) + +After updating the configuration, upgrade your Prometheus installation: + +```bash +cd conf/f3s/prometheus +just upgrade # Or manually: +# helm upgrade prometheus prometheus-community/kube-prometheus-stack \ +# -n monitoring -f persistence-values.yaml +``` + +Verify the features are enabled: + +```bash +# Check Remote Write receiver flag +kubectl get pod -n monitoring prometheus-prometheus-kube-prometheus-prometheus-0 \ + -o jsonpath='{.spec.containers[0].args}' | grep -o "web.enable-remote-write-receiver" + +# Check out-of-order time window +kubectl get prometheus -n monitoring prometheus-kube-prometheus-prometheus \ + -o jsonpath='{.spec.tsdb.outOfOrderTimeWindow}' +# Should output: 744h + +# Check admin API flag +kubectl get pod -n monitoring prometheus-prometheus-kube-prometheus-prometheus-0 \ + -o jsonpath='{.spec.containers[0].args}' | grep -o "web.enable-admin-api" +``` + +**Performance Considerations:** + +This configuration is designed for ad-hoc troubleshooting and development, **NOT production use**. Enabling these features has trade-offs: + +- **Increased Memory Usage**: Out-of-order ingestion requires additional memory for buffering and sorting time series +- **Higher TSDB Overhead**: Prometheus TSDB needs to handle non-sequential writes, increasing disk I/O +- **Query Performance**: Queries may be slower due to fragmented data blocks +- **Storage Amplification**: Out-of-order samples can trigger additional compactions, increasing storage usage + +**Recommendation for Production:** +- Keep `outOfOrderTimeWindow` as small as possible (or disabled) +- Monitor Prometheus memory and disk usage closely +- Use Remote Write only when necessary +- Consider using dedicated testing/development Prometheus instances + +**Note**: The syntax changed in Prometheus 3.x - use `additionalArgs` with `web.enable-remote-write-receiver` instead of the deprecated `enableFeatures: [remote-write-receiver]`. + +### 2. Update Prometheus Scrape Config + +Ensure Pushgateway is in scrape targets: + +```yaml +# additional-scrape-configs.yaml +- job_name: 'pushgateway' + honor_labels: true + static_configs: + - targets: + - 'pushgateway.monitoring.svc.cluster.local:9091' +``` + +Apply the configuration: + +```bash +kubectl create secret generic additional-scrape-configs \ + --from-file=/home/paul/git/conf/f3s/prometheus/additional-scrape-configs.yaml \ + --dry-run=client -o yaml -n monitoring | kubectl apply -f - +``` + +## Building from Source + +### Using Mage (Recommended) + +This project includes a [Magefile](./MAGEFILE.md) for easy building, testing, and running: + +```bash +# Install Mage (one-time setup) +go install github.com/magefile/mage@latest + +# Build binary +mage build + +# Run tests +mage test + +# Run with coverage report +mage testCoverage + +# Run in realtime mode +mage run + +# See all available targets +mage -l +``` + +See [MAGEFILE.md](./MAGEFILE.md) for complete documentation. + +### Using Go directly + +```bash +# Build binary +go build -o epimetheus cmd/epimetheus/main.go + +# Run tests +go test ./... -v + +# Check test coverage +go test ./... -cover +``` + +## Troubleshooting + +### Binary can't connect to Pushgateway + +```bash +# Check port-forward is running +ps aux | grep "port-forward.*9091" + +# Restart port-forward +kubectl port-forward -n monitoring svc/pushgateway 9091:9091 +``` + +### Metrics not appearing in Prometheus + +```bash +# Check Pushgateway has metrics +curl http://localhost:9091/metrics | grep "prometheus_pusher_test" + +# Check Prometheus scrape targets +# Open http://localhost:9090/targets - look for "pushgateway" job + +# Check Prometheus logs +kubectl logs -n monitoring -l app.kubernetes.io/name=prometheus +``` + +### "Remote write receiver not enabled" error + +```bash +# Verify feature is enabled +kubectl logs -n monitoring prometheus-prometheus-kube-prometheus-prometheus-0 | grep "remote-write-receiver" + +# Should see: msg="Experimental features enabled" features=[remote-write-receiver] +``` + +### "Out of order sample" error + +This occurs when trying to insert data older than existing data for the same time series. + +**Solutions:** +- Use different job labels for historic data (e.g., `job="historic_data"`) +- Enable out-of-order ingestion in Prometheus (experimental) +- Ensure backfill goes from oldest to newest + +### Dashboard not appearing in Grafana + +```bash +# Check ConfigMap exists +kubectl get configmap -n monitoring | grep epimetheus + +# Check labels +kubectl get configmap epimetheus-dashboard -n monitoring -o yaml | grep "grafana_dashboard" + +# Restart Grafana to force reload +kubectl rollout restart deployment/prometheus-grafana -n monitoring +``` + +## Architecture + +``` +┌─────────────────┐ +│ Go Binary │ +│ (prometheus- │──Push realtime──┐ +│ pusher) │ │ +└─────────────────┘ ▼ + │ ┌──────────────────┐ + │ │ Pushgateway │◄──Scrape──┐ + │ │ (Port 9091) │ │ + │ └──────────────────┘ │ + │ │ + └──Push historic──────────────────┐ │ + ▼ │ + ┌─────────────────┐ │ + │ Prometheus │◄────┘ + │ (Port 9090) │ + │ Remote Write API│ + └─────────────────┘ + │ + │ Datasource + ▼ + ┌─────────────────┐ + │ Grafana │ + │ (Port 3000) │ + │ Dashboards │ + └─────────────────┘ +``` + +## Best Practices + +### When to Use Pushgateway vs. Remote Write + +**Use Pushgateway (realtime mode):** +- Short-lived batch jobs +- Service-level metrics +- Jobs behind firewalls +- Current/recent data (< 5 minutes old) + +**Use Remote Write (historic mode):** +- Historic data import +- Backfilling gaps +- Data migration +- Data older than 5 minutes + +**Use Auto Mode:** +- Mixed current and historic data +- Importing from files +- Unknown timestamp ages +- General-purpose ingestion + +### Metric Design + +- **Use appropriate metric types:** + - Counter for cumulative values (requests, errors) + - Gauge for point-in-time values (temperature, connections) + - Histogram for distributions (latency, sizes) + +- **Label cardinality:** + - Include meaningful labels + - Avoid high-cardinality labels (user IDs, timestamps) + - Keep label combinations reasonable (< 1000 per metric) + +- **Naming conventions:** + - Use descriptive names + - Include units in gauge names (\_celsius, \_bytes) + - Use \_total suffix for counters + +## Cleanup + +### Cleaning Up Benchmark Data from Prometheus + +For cleaning up benchmark metrics from Prometheus, use the provided cleanup script: + +```bash +# Port-forward to Prometheus +kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090:9090 & + +# Run the cleanup script +./cleanup-benchmark-data.sh +``` + +The script will: +1. Delete all `epimetheus_benchmark_*` metrics using the Prometheus Admin API +2. Clean up tombstones to free disk space +3. Provide clear success/error feedback + +**Manual cleanup** (if you prefer): + +```bash +# Delete specific metric +curl -X POST 'http://localhost:9090/api/v1/admin/tsdb/delete_series?match[]=epimetheus_benchmark_cpu_usage' + +# Clean up tombstones +curl -X POST 'http://localhost:9090/api/v1/admin/tsdb/clean_tombstones' +``` + +### Other Cleanup Tasks + +```bash +# Stop port-forwards +pkill -f "port-forward.*9091" +pkill -f "port-forward.*9090" +pkill -f "port-forward.*3000" + +# Delete test metrics from Pushgateway +curl -X DELETE http://localhost:9091/metrics/job/example_metrics_pusher + +# Uninstall Pushgateway (if needed) +helm uninstall pushgateway -n monitoring +``` + +## MacOS Setup + +### Basic Installation + +```bash +brew install prometheus +brew install grafana +go install github.com/prometheus/pushgateway@latest +brew services start grafana +brew services start prometheus +~/go/bin/pushgateway & +``` + +Once done, login to http://localhost:3000 as admin:admin, you will be prompted to change the password. Afterwards, add http://localhost:9090 as a Prometheus datasource. + +### Enable Remote Write Receiver (Required for Watch Mode) + +⚠️ **Important**: Watch mode, historic mode, backfill mode, and auto mode require the Prometheus Remote Write receiver to be enabled. + +#### Option 1: Permanent Configuration (Recommended) + +Edit the Prometheus arguments file: + +```bash +# Edit the arguments file +nano /opt/homebrew/etc/prometheus.args +``` + +Add this line at the end: +``` +--web.enable-remote-write-receiver +``` + +The complete file should look like: +``` +--config.file /opt/homebrew/etc/prometheus.yml +--web.listen-address=127.0.0.1:9090 +--storage.tsdb.path /opt/homebrew/var/prometheus +--web.enable-remote-write-receiver +--web.enable-admin-api +``` + +**Note:** `--web.enable-admin-api` is optional but recommended for easier data management (allows deleting old metrics). + +Restart Prometheus: +```bash +brew services restart prometheus +``` + +Verify it's working: +```bash +# Check Prometheus is healthy +curl http://localhost:9090/-/healthy + +# Test Remote Write endpoint (should return 400, not 404) +curl -X POST http://localhost:9090/api/v1/write +``` + +#### Option 2: Temporary (For Testing) + +Stop the service and start manually: + +```bash +# Stop brew service +brew services stop prometheus + +# Start with Remote Write enabled +prometheus --web.enable-remote-write-receiver +``` + +Keep this terminal open. In another terminal, run your epimetheus commands. + +**Note**: This only lasts until you stop the terminal. Use Option 1 for permanent setup. + +### Clearing Old Metrics (Optional) + +If you need to delete old metrics and start fresh: + +```bash +# Delete specific metrics (e.g., blockstore) +curl -X POST -g 'http://localhost:9090/api/v1/admin/tsdb/delete_series?match[]={__name__=~"blockstore_.*"}' + +# Clean up deleted data +curl -X POST http://localhost:9090/api/v1/admin/tsdb/clean_tombstones + +# Wait a moment for cleanup +sleep 2 +``` + +**Note:** Admin API must be enabled (add `--web.enable-admin-api` to prometheus.args). + +### Verify Setup + +Once Remote Write is enabled, test watch mode: + +```bash +# Create a test CSV +cat > /tmp/test.csv << EOF +status,count,method +200,100,GET +404,50,POST +EOF + +# Watch the file +./epimetheus -mode=watch \ + -file=/tmp/test.csv \ + -metric-name=test \ + -prometheus=http://localhost:9090/api/v1/write +``` + +You should see: +``` +✅ Successfully pushed X samples to Prometheus +``` + +Query in Prometheus (http://localhost:9090): +```promql +{__name__=~"test_.*"} +``` + +## Additional Resources + +- [Prometheus Documentation](https://prometheus.io/docs/) +- [Pushgateway Documentation](https://github.com/prometheus/pushgateway) +- [Prometheus Remote Write Spec](https://prometheus.io/docs/concepts/remote_write_spec/) +- [Grafana Documentation](https://grafana.com/docs/) + +## Version + +Current version: 0.0.0 + +## License + +See LICENSE file for details. -- cgit v1.2.3