Epimetheus Logo
# Epimetheus A versatile Go tool for pushing metrics to Prometheus with support for both realtime and historic data ingestion. ## Why "Epimetheus"? In Greek mythology, [Epimetheus](https://en.wikipedia.org/wiki/Epimetheus_(mythology)) is Prometheus's brother, whose name means "afterthought" or "hindsight" (while Prometheus means "forethought"). This name cleverly captures the tool's purpose: bringing data to Prometheus **after** collection, whether it's historic data from hours, days, or weeks ago, or realtime data pushed on-demand. While Epimetheus is sometimes depicted as foolish in myths (he accepted Pandora's box despite warnings), this tool embraces the "afterthought" aspect productively - it's never too late to bring your metrics home to Prometheus! ## Architecture ``` ┌─────────────────────────────────────────────────────────────────────────┐ │ Epimetheus │ │ (Metrics Ingestion Tool) │ │ │ │ Modes: │ │ • Realtime - Current metrics (< 5 min old) │ │ • Historic - Historic metrics (≥ 5 min old) │ │ • Backfill - Range of historic data │ │ • Auto - Automatic routing based on timestamp age │ └─────────────────────────────────────────────────────────────────────────┘ │ │ │ Realtime Data │ Historic Data │ (via HTTP POST) │ (via Remote Write API) │ Uses "now" timestamp │ Preserves timestamps ▼ ▼ ┌─────────────────────┐ ┌─────────────────────┐ │ Pushgateway │ │ Prometheus │ │ (Port 9091) │ │ (Port 9090) │ │ │ │ │ │ • Buffers metrics │ │ Remote Write API: │ │ • Scraped by │──── Scraped ─────▶ │ /api/v1/write │ │ Prometheus │ every 15-30s │ │ │ • No timestamp │ │ Feature Required: │ │ preservation │ │ --enable-feature= │ │ │ │ remote-write- │ │ │ │ receiver │ └─────────────────────┘ └─────────────────────┘ │ │ Prometheus Query API │ /api/v1/query ▼ ┌─────────────────────┐ │ Grafana │ │ (Port 3000) │ │ │ │ • Prometheus as │ │ datasource │ │ • Dashboards: │ │ - Epimetheus │ │ Test Metrics │ │ • Auto-refresh │ └─────────────────────┘ ``` ### Data Flow 1. **Realtime Path** (for current data): - Epimetheus → Pushgateway (HTTP POST) - Prometheus scrapes Pushgateway periodically - Timestamp = "now" when Prometheus scrapes 2. **Historic Path** (for old data): - Epimetheus → Prometheus Remote Write API (HTTP POST) - Direct write to Prometheus TSDB - Timestamp preserved from original data 3. **Visualization**: - Grafana queries Prometheus - Displays metrics in dashboards - Auto-refresh every 10 seconds ## Overview **epimetheus** is a standalone binary that: - **Generates** realistic example metrics simulating production applications - **Pushes** metrics via Pushgateway (realtime) or Remote Write API (historic) - **Automatically detects** timestamp age and chooses the optimal ingestion method - **Supports** multiple data formats (CSV, JSON) and all Prometheus metric types - **Provides** Grafana dashboard for visualizing test metrics ## Quick Start ### 1. Deploy Pushgateway (one-time setup) The Pushgateway Helm chart is available in the [conf repository](https://codeberg.org/snonux/conf) at `f3s/pushgateway/helm-chart`. ```bash # Clone the conf repository if you haven't already git clone https://codeberg.org/snonux/conf.git cd conf/f3s/pushgateway/helm-chart # Deploy Pushgateway helm upgrade --install pushgateway . -n monitoring --create-namespace ``` Alternatively, deploy Pushgateway using the official chart: ```bash helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm install pushgateway prometheus-community/prometheus-pushgateway -n monitoring --create-namespace ``` ### 2. Run in Realtime Mode ```bash # Port-forward Pushgateway kubectl port-forward -n monitoring svc/pushgateway 9091:9091 & # Push test metrics continuously cd /home/paul/git/conf/f3s/epimetheus ./epimetheus -mode=realtime -continuous ``` The binary pushes metrics every 15 seconds. Press Ctrl+C to stop. ### 3. View Metrics ```bash # Pushgateway UI open http://localhost:9091 # Prometheus UI kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090:9090 & open http://localhost:9090 ``` ## Operating Modes ### 👁️ Watch Mode Monitor CSV files for changes and push metrics to Prometheus with file modification timestamps. **Works with ANY CSV format** - automatically detects numeric vs string columns and sanitizes names. **NEW: Automatic DNS Resolution** - IP addresses are automatically resolved to hostnames for better observability in Grafana. ```bash ./epimetheus -mode=watch \ -file=mydata.csv \ -metric-name=myapp \ -prometheus=http://localhost:9090/api/v1/write ``` **Features:** - 🔍 **Format-agnostic**: Works with any tabular CSV structure - 📊 **Automatic detection**: Numeric columns → metrics, String columns → labels - 🏷️ **Name sanitization**: `min(potatoes)`, `avg(time)`, `p99(latency)` → valid metric names - 🌐 **DNS Resolution**: IP addresses → hostnames (e.g., `10.50.52.61` → `foo.example.lan`) - 💾 **Smart Caching**: In-memory cache prevents redundant DNS lookups - ⏱️ **Timestamp preservation**: Uses file modification time - 🔄 **Continuous monitoring**: Polls file every 1 second - 💪 **Error resilient**: Continues watching despite failures - 🎯 **Remote Write**: Pushes to Prometheus (preserves timestamps) **CSV Format:** Works with any tabular CSV: - First row: column headers (automatically sanitized) - Subsequent rows: data values - Column names can be anything: `min(x)`, `avg(y)`, `p99(latency)`, etc. **Example 1** - Web metrics: ```csv avg(response_time),p99(latency),endpoint,method 45.2,120.5,/api/users,GET 52.1,135.8,/api/orders,POST ``` Generates: ```promql web_avg_response_time{endpoint="/api/users",method="GET"} 45.2 web_p99_latency{endpoint="/api/users",method="GET"} 120.5 web_avg_response_time{endpoint="/api/orders",method="POST"} 52.1 web_p99_latency{endpoint="/api/orders",method="POST"} 135.8 ``` **Example 2** - Food metrics: ```csv min(potatoes),last(coke),avg(price),country,store_type 5.2,10.5,12.99,USA,grocery 3.8,8.2,9.99,Canada,convenience ``` Generates: ```promql food_min_potatoes{country="USA",store_type="grocery"} 5.2 food_last_coke{country="USA",store_type="grocery"} 10.5 food_avg_price{country="USA",store_type="grocery"} 12.99 # ... etc ``` Each row generates N samples (N = number of numeric columns). See [CSV-FORMAT-FLEXIBILITY.md](CSV-FORMAT-FLEXIBILITY.md) for more examples. **Options:** - `-file` - CSV file to watch (required) - `-metric-name` - Base metric name (required, e.g., `food`, `network`, `database`) - `-prometheus` - Prometheus Remote Write URL (default: http://localhost:9090/api/v1/write) - `-clickhouse` - ClickHouse HTTP URL (e.g. http://localhost:8123) to also ingest metrics - `-clickhouse-table` - ClickHouse table name (default: epimetheus_metrics) - `-job` - Job name for metrics (default: example_metrics_pusher) - `-resolve-ip-labels` - Additional IP labels to resolve via DNS (default: ip is always resolved) **ClickHouse Support:** Watch mode can ingest to ClickHouse in addition to (or instead of) Prometheus: ```bash # Ingest to both Prometheus and ClickHouse ./epimetheus -mode=watch -file=data.csv -metric-name=myapp \ -prometheus=http://localhost:9090/api/v1/write \ -clickhouse=http://localhost:8123 # ClickHouse only (use -prometheus= to disable Prometheus) ./epimetheus -mode=watch -file=test-data/watch-clickhouse-test.csv \ -metric-name=watch_test -clickhouse=http://localhost:8123 -prometheus= # Verify data in ClickHouse ./verify-clickhouse.sh ``` **DNS Resolution:** By default, the `ip` label is automatically resolved to a hostname. To resolve additional IP labels: ```bash ./epimetheus -mode=watch \ -file=network.csv \ -metric-name=network \ -resolve-ip-labels=source_ip,dest_ip ``` This will resolve: `ip` (default) + `source_ip` + `dest_ip` **Example:** - Input: `ip="10.50.52.61"` - Output: `ip="foo.example.lan"` - Failed lookups: IP remains unchanged **Documentation:** - [DNS-RESOLUTION-FEATURE.md](DNS-RESOLUTION-FEATURE.md) - Complete DNS resolution guide - [CSV-FORMAT-FLEXIBILITY.md](CSV-FORMAT-FLEXIBILITY.md) - Works with ANY CSV format - [DTAIL-METRICS-EXAMPLE.md](DTAIL-METRICS-EXAMPLE.md) - Detailed dtail.csv example ### 🔄 Realtime Mode (Default) Push current metrics to Pushgateway with "now" timestamp. ```bash ./epimetheus -mode=realtime -continuous ``` **Options:** - `-pushgateway` - Pushgateway URL (default: http://localhost:9091) - `-job` - Job name (default: example_metrics_pusher) - `-continuous` - Keep pushing every 15 seconds ### ⏰ Historic Mode Push a single datapoint from the past using Remote Write API. ```bash # Port-forward Prometheus kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090:9090 & # Push data from 24 hours ago ./epimetheus -mode=historic -hours-ago=24 ``` **Options:** - `-prometheus` - Prometheus URL (default: http://localhost:9090/api/v1/write) - `-hours-ago` - Hours in the past (default: 24) ### 📦 Backfill Mode Import a range of historic data points. ```bash # Backfill last 48 hours with 1-hour intervals ./epimetheus -mode=backfill -start-hours=48 -end-hours=0 -interval=1 # Backfill last week with 6-hour intervals ./epimetheus -mode=backfill -start-hours=168 -end-hours=0 -interval=6 ``` **Options:** - `-start-hours` - Start time in hours ago - `-end-hours` - End time in hours ago (0 = now) - `-interval` - Interval between points in hours ### 🤖 Auto Mode (Recommended!) Automatically detect timestamp age and route to the correct ingestion method. ```bash # Generate test data ./generate-test-data.sh # Import mixed current and historic data ./epimetheus -mode=auto -file=test-all-ages.csv ``` **Detection Logic:** - Data < 5 minutes old → Pushgateway (realtime) - Data ≥ 5 minutes old → Remote Write (historic) **Options:** - `-file` - Input file path - `-format` - Data format: csv or json (default: csv) - `-pushgateway` - Pushgateway URL - `-prometheus` - Prometheus Remote Write URL ## Data Formats ### CSV Format ```csv # Format: metric_name,labels,value,timestamp_ms # Labels: key1=value1;key2=value2 epimetheus_test_requests_total,instance=web1;env=prod,100,1767125148000 epimetheus_test_temperature_celsius,instance=web2,22.5,1767038748000 # Timestamp is optional (uses "now" if omitted) epimetheus_test_active_connections,instance=web3,42, ``` ### JSON Format ```json [ { "metric": "epimetheus_test_requests_total", "labels": {"instance": "web1", "env": "prod"}, "value": 100, "timestamp_ms": 1767125148000 }, { "metric": "epimetheus_test_temperature_celsius", "labels": {"instance": "web2"}, "value": 22.5, "timestamp_ms": 1767038748000 } ] ``` ## Test Metrics All generated metrics use the `epimetheus_test_` prefix to clearly identify them as test data. ### Counter: `epimetheus_test_requests_total` - **Type:** Counter (monotonically increasing) - **Description:** Total number of requests processed - **Use case:** Counting total events, requests, errors ### Gauge: `epimetheus_test_active_connections` - **Type:** Gauge (can increase or decrease) - **Description:** Current number of active connections (0-100) - **Use case:** Current state measurements, capacity ### Gauge: `epimetheus_test_temperature_celsius` - **Type:** Gauge - **Description:** Current temperature in Celsius (0-50°C) - **Use case:** Environmental monitoring ### Histogram: `epimetheus_test_request_duration_seconds` - **Type:** Histogram (distribution) - **Description:** Request duration distribution - **Buckets:** 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10 seconds - **Use case:** Latency measurements, SLO tracking ### Labeled Counter: `epimetheus_test_jobs_processed_total` - **Type:** Counter with labels - **Description:** Jobs processed by type and status - **Labels:** - `job_type`: email, report, backup - `status`: success, failed - **Use case:** Categorized counting, multi-dimensional metrics ## Grafana Dashboard A comprehensive dashboard is available showcasing all test metrics. ### Dashboard Features - **8 Panels:** 1. Request Rate (line graph) 2. Total Requests (stat panel) 3. Active Connections (gauge with thresholds) 4. Temperature (gauge with thresholds) 5. Request Duration Histogram (p50, p90, p99) 6. Average Request Duration (stat) 7. Jobs Processed by Type (bar gauge) 8. Jobs Status Breakdown (table) - **Auto-refresh:** Every 10 seconds - **Time range:** Last 15 minutes (customizable) - **Dark theme optimized** ### Deploy Dashboard #### Option 1: Helm/Kubernetes ConfigMap (Recommended) ```bash # Deploy via Kubernetes ConfigMap kubectl apply -f ../prometheus/epimetheus-dashboard.yaml ``` The dashboard will be automatically discovered by Grafana. #### Option 2: Manual Import ```bash # Port-forward Grafana kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80 # Open Grafana open http://localhost:3000 # Go to Dashboards → Import → Upload grafana-dashboard.json ``` #### Option 3: Automated Script ```bash # Deploy via API ./deploy-dashboard.sh # Or with custom credentials GRAFANA_URL="http://localhost:3000" \ GRAFANA_USER="admin" \ GRAFANA_PASSWORD="yourpassword" \ ./deploy-dashboard.sh ``` ## Example Queries ### Basic Queries ```promql # View total requests epimetheus_test_requests_total # View request rate over last 5 minutes rate(epimetheus_test_requests_total[5m]) # View current active connections epimetheus_test_active_connections # View current temperature epimetheus_test_temperature_celsius ``` ### Histogram Queries ```promql # 95th percentile request duration histogram_quantile(0.95, rate(epimetheus_test_request_duration_seconds_bucket[5m])) # 50th percentile (median) histogram_quantile(0.50, rate(epimetheus_test_request_duration_seconds_bucket[5m])) # Average request duration rate(epimetheus_test_request_duration_seconds_sum[5m]) / rate(epimetheus_test_request_duration_seconds_count[5m]) ``` ### Labeled Counter Queries ```promql # Failed jobs by type epimetheus_test_jobs_processed_total{status="failed"} # Job success rate rate(epimetheus_test_jobs_processed_total{status="success"}[5m]) / rate(epimetheus_test_jobs_processed_total[5m]) # Total jobs by type sum by (job_type) (epimetheus_test_jobs_processed_total) ``` ### Curl Examples ```bash # Port-forward Prometheus kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090:9090 & # Query total requests curl -s "http://localhost:9090/api/v1/query?query=epimetheus_test_requests_total" | jq . # Query temperature curl -s "http://localhost:9090/api/v1/query?query=epimetheus_test_temperature_celsius" | jq . # Query request rate curl -s "http://localhost:9090/api/v1/query?query=rate(epimetheus_test_requests_total[5m])" | jq . # Query histogram p95 curl -s "http://localhost:9090/api/v1/query?query=histogram_quantile(0.95,rate(epimetheus_test_request_duration_seconds_bucket[5m]))" | jq . ``` ## Time Range Limitations ### ✅ Supported Time Ranges | Time Range | Status | Method | |------------|--------|--------| | Current (< 5 min) | ✅ Works | Pushgateway | | 1 hour old | ✅ Works | Remote Write | | 1 day old | ✅ Works | Remote Write | | 1 week old | ✅ Works | Remote Write | | 1 month old | ✅ Works | Remote Write | ### ⚠️ Potential Issues - **Future timestamps:** Rejected (> 5 minutes in future) - **Very old data (6+ months):** May be rejected depending on Prometheus retention - **Years old:** Likely rejected - use `promtool tsdb create-blocks-from` instead - **Out-of-order samples:** Can't insert older data into existing time series (use different labels) ### Prometheus Configuration Check your retention settings: ```bash # View retention kubectl get prometheus -n monitoring prometheus-kube-prometheus-prometheus \ -o jsonpath='{.spec.retention}' # Default is typically 15 days ``` For very old data: - Increase retention in Prometheus config - Enable out-of-order ingestion (experimental) - Use `promtool` for direct TSDB block creation ## Project Structure ``` epimetheus/ ├── cmd/ │ └── epimetheus/ │ └── main.go # Main entry point ├── internal/ │ ├── config/ # Configuration │ ├── metrics/ # Metric generators │ ├── parser/ # CSV/JSON parsers (includes tabular CSV) │ ├── ingester/ # Pushgateway & Remote Write ingesters │ └── watcher/ # File watcher for watch mode ├── epimetheus # Compiled binary ├── grafana-dashboard.json # Grafana dashboard definition ├── deploy-dashboard.sh # Dashboard deployment script ├── generate-test-data.sh # Test data generator ├── run.sh # Helper script └── README.md # This file ``` ## Setup Requirements ### 1. Enable Prometheus Remote Write Receiver ⚠️ **REQUIRED for Historic Data** **IMPORTANT**: To use historic mode, backfill mode, or auto mode with old data, you **must** enable the Prometheus Remote Write receiver. Without this feature, Epimetheus can only push realtime data via Pushgateway. The Remote Write receiver is configured in the [conf repository](https://codeberg.org/snonux/conf) at `f3s/prometheus/persistence-values.yaml`: ```yaml # In prometheus/persistence-values.yaml (from conf repository) prometheus: prometheusSpec: # Enable Remote Write receiver endpoint and Admin API (Prometheus 3.x syntax) additionalArgs: - name: web.enable-remote-write-receiver value: "" - name: web.enable-admin-api value: "" # Enable out-of-order ingestion for backfilling # Allows writing data points older than existing data for the same time series enableFeatures: - exemplar-storage - otlp-write-receiver # Allow backfilling up to 31 days in the past (provides 1-day buffer for 30-day datasets) tsdb: outOfOrderTimeWindow: 744h # 31 days ``` **What This Enables:** - **Remote Write API**: HTTP endpoint at `/api/v1/write` for ingesting metrics with custom timestamps - **Admin API**: HTTP endpoints at `/api/v1/admin/tsdb/*` for data deletion and management - **Out-of-Order Ingestion**: Allows writing data points older than existing data for the same time series - **31-Day Window**: Can backfill data up to 31 days in the past (provides 1-day buffer for 30-day datasets) After updating the configuration, upgrade your Prometheus installation: ```bash cd conf/f3s/prometheus just upgrade # Or manually: # helm upgrade prometheus prometheus-community/kube-prometheus-stack \ # -n monitoring -f persistence-values.yaml ``` Verify the features are enabled: ```bash # Check Remote Write receiver flag kubectl get pod -n monitoring prometheus-prometheus-kube-prometheus-prometheus-0 \ -o jsonpath='{.spec.containers[0].args}' | grep -o "web.enable-remote-write-receiver" # Check out-of-order time window kubectl get prometheus -n monitoring prometheus-kube-prometheus-prometheus \ -o jsonpath='{.spec.tsdb.outOfOrderTimeWindow}' # Should output: 744h # Check admin API flag kubectl get pod -n monitoring prometheus-prometheus-kube-prometheus-prometheus-0 \ -o jsonpath='{.spec.containers[0].args}' | grep -o "web.enable-admin-api" ``` **Performance Considerations:** This configuration is designed for ad-hoc troubleshooting and development, **NOT production use**. Enabling these features has trade-offs: - **Increased Memory Usage**: Out-of-order ingestion requires additional memory for buffering and sorting time series - **Higher TSDB Overhead**: Prometheus TSDB needs to handle non-sequential writes, increasing disk I/O - **Query Performance**: Queries may be slower due to fragmented data blocks - **Storage Amplification**: Out-of-order samples can trigger additional compactions, increasing storage usage **Recommendation for Production:** - Keep `outOfOrderTimeWindow` as small as possible (or disabled) - Monitor Prometheus memory and disk usage closely - Use Remote Write only when necessary - Consider using dedicated testing/development Prometheus instances **Note**: The syntax changed in Prometheus 3.x - use `additionalArgs` with `web.enable-remote-write-receiver` instead of the deprecated `enableFeatures: [remote-write-receiver]`. ### 2. Update Prometheus Scrape Config Ensure Pushgateway is in scrape targets: ```yaml # additional-scrape-configs.yaml - job_name: 'pushgateway' honor_labels: true static_configs: - targets: - 'pushgateway.monitoring.svc.cluster.local:9091' ``` Apply the configuration: ```bash kubectl create secret generic additional-scrape-configs \ --from-file=/home/paul/git/conf/f3s/prometheus/additional-scrape-configs.yaml \ --dry-run=client -o yaml -n monitoring | kubectl apply -f - ``` ## Building from Source ### Using Mage (Recommended) This project includes a [Magefile](./MAGEFILE.md) for easy building, testing, and running: ```bash # Install Mage (one-time setup) go install github.com/magefile/mage@latest # Build binary mage build # Run tests mage test # Run with coverage report mage testCoverage # Run in realtime mode mage run # See all available targets mage -l ``` See [MAGEFILE.md](./MAGEFILE.md) for complete documentation. ### Using Go directly ```bash # Build binary go build -o epimetheus cmd/epimetheus/main.go # Run tests go test ./... -v # Check test coverage go test ./... -cover ``` ## Troubleshooting ### Binary can't connect to Pushgateway ```bash # Check port-forward is running ps aux | grep "port-forward.*9091" # Restart port-forward kubectl port-forward -n monitoring svc/pushgateway 9091:9091 ``` ### Metrics not appearing in Prometheus ```bash # Check Pushgateway has metrics curl http://localhost:9091/metrics | grep "prometheus_pusher_test" # Check Prometheus scrape targets # Open http://localhost:9090/targets - look for "pushgateway" job # Check Prometheus logs kubectl logs -n monitoring -l app.kubernetes.io/name=prometheus ``` ### "Remote write receiver not enabled" error ```bash # Verify feature is enabled kubectl logs -n monitoring prometheus-prometheus-kube-prometheus-prometheus-0 | grep "remote-write-receiver" # Should see: msg="Experimental features enabled" features=[remote-write-receiver] ``` ### "Out of order sample" error This occurs when trying to insert data older than existing data for the same time series. **Solutions:** - Use different job labels for historic data (e.g., `job="historic_data"`) - Enable out-of-order ingestion in Prometheus (experimental) - Ensure backfill goes from oldest to newest ### Dashboard not appearing in Grafana ```bash # Check ConfigMap exists kubectl get configmap -n monitoring | grep epimetheus # Check labels kubectl get configmap epimetheus-dashboard -n monitoring -o yaml | grep "grafana_dashboard" # Restart Grafana to force reload kubectl rollout restart deployment/prometheus-grafana -n monitoring ``` ## Architecture ``` ┌─────────────────┐ │ Go Binary │ │ (prometheus- │──Push realtime──┐ │ pusher) │ │ └─────────────────┘ ▼ │ ┌──────────────────┐ │ │ Pushgateway │◄──Scrape──┐ │ │ (Port 9091) │ │ │ └──────────────────┘ │ │ │ └──Push historic──────────────────┐ │ ▼ │ ┌─────────────────┐ │ │ Prometheus │◄────┘ │ (Port 9090) │ │ Remote Write API│ └─────────────────┘ │ │ Datasource ▼ ┌─────────────────┐ │ Grafana │ │ (Port 3000) │ │ Dashboards │ └─────────────────┘ ``` ## Best Practices ### When to Use Pushgateway vs. Remote Write **Use Pushgateway (realtime mode):** - Short-lived batch jobs - Service-level metrics - Jobs behind firewalls - Current/recent data (< 5 minutes old) **Use Remote Write (historic mode):** - Historic data import - Backfilling gaps - Data migration - Data older than 5 minutes **Use Auto Mode:** - Mixed current and historic data - Importing from files - Unknown timestamp ages - General-purpose ingestion ### Metric Design - **Use appropriate metric types:** - Counter for cumulative values (requests, errors) - Gauge for point-in-time values (temperature, connections) - Histogram for distributions (latency, sizes) - **Label cardinality:** - Include meaningful labels - Avoid high-cardinality labels (user IDs, timestamps) - Keep label combinations reasonable (< 1000 per metric) - **Naming conventions:** - Use descriptive names - Include units in gauge names (\_celsius, \_bytes) - Use \_total suffix for counters ## Cleanup ### Cleaning Up Benchmark Data from Prometheus For cleaning up benchmark metrics from Prometheus, use the provided cleanup script: ```bash # Port-forward to Prometheus kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090:9090 & # Run the cleanup script ./cleanup-benchmark-data.sh ``` The script will: 1. Delete all `epimetheus_benchmark_*` metrics using the Prometheus Admin API 2. Clean up tombstones to free disk space 3. Provide clear success/error feedback **Manual cleanup** (if you prefer): ```bash # Delete specific metric curl -X POST 'http://localhost:9090/api/v1/admin/tsdb/delete_series?match[]=epimetheus_benchmark_cpu_usage' # Clean up tombstones curl -X POST 'http://localhost:9090/api/v1/admin/tsdb/clean_tombstones' ``` ### Other Cleanup Tasks ```bash # Stop port-forwards pkill -f "port-forward.*9091" pkill -f "port-forward.*9090" pkill -f "port-forward.*3000" # Delete test metrics from Pushgateway curl -X DELETE http://localhost:9091/metrics/job/example_metrics_pusher # Uninstall Pushgateway (if needed) helm uninstall pushgateway -n monitoring ``` ## MacOS Setup ### Basic Installation ```bash brew install prometheus brew install grafana go install github.com/prometheus/pushgateway@latest brew services start grafana brew services start prometheus ~/go/bin/pushgateway & ``` Once done, login to http://localhost:3000 as admin:admin, you will be prompted to change the password. Afterwards, add http://localhost:9090 as a Prometheus datasource. ### Enable Remote Write Receiver (Required for Watch Mode) ⚠️ **Important**: Watch mode, historic mode, backfill mode, and auto mode require the Prometheus Remote Write receiver to be enabled. #### Option 1: Permanent Configuration (Recommended) Edit the Prometheus arguments file: ```bash # Edit the arguments file nano /opt/homebrew/etc/prometheus.args ``` Add this line at the end: ``` --web.enable-remote-write-receiver ``` The complete file should look like: ``` --config.file /opt/homebrew/etc/prometheus.yml --web.listen-address=127.0.0.1:9090 --storage.tsdb.path /opt/homebrew/var/prometheus --web.enable-remote-write-receiver --web.enable-admin-api ``` **Note:** `--web.enable-admin-api` is optional but recommended for easier data management (allows deleting old metrics). Restart Prometheus: ```bash brew services restart prometheus ``` Verify it's working: ```bash # Check Prometheus is healthy curl http://localhost:9090/-/healthy # Test Remote Write endpoint (should return 400, not 404) curl -X POST http://localhost:9090/api/v1/write ``` #### Option 2: Temporary (For Testing) Stop the service and start manually: ```bash # Stop brew service brew services stop prometheus # Start with Remote Write enabled prometheus --web.enable-remote-write-receiver ``` Keep this terminal open. In another terminal, run your epimetheus commands. **Note**: This only lasts until you stop the terminal. Use Option 1 for permanent setup. ### Clearing Old Metrics (Optional) If you need to delete old metrics and start fresh: ```bash # Delete specific metrics (e.g., blockstore) curl -X POST -g 'http://localhost:9090/api/v1/admin/tsdb/delete_series?match[]={__name__=~"blockstore_.*"}' # Clean up deleted data curl -X POST http://localhost:9090/api/v1/admin/tsdb/clean_tombstones # Wait a moment for cleanup sleep 2 ``` **Note:** Admin API must be enabled (add `--web.enable-admin-api` to prometheus.args). ### Verify Setup Once Remote Write is enabled, test watch mode: ```bash # Create a test CSV cat > /tmp/test.csv << EOF status,count,method 200,100,GET 404,50,POST EOF # Watch the file ./epimetheus -mode=watch \ -file=/tmp/test.csv \ -metric-name=test \ -prometheus=http://localhost:9090/api/v1/write ``` You should see: ``` ✅ Successfully pushed X samples to Prometheus ``` Query in Prometheus (http://localhost:9090): ```promql {__name__=~"test_.*"} ``` ## Additional Resources - [Prometheus Documentation](https://prometheus.io/docs/) - [Pushgateway Documentation](https://github.com/prometheus/pushgateway) - [Prometheus Remote Write Spec](https://prometheus.io/docs/concepts/remote_write_spec/) - [Grafana Documentation](https://grafana.com/docs/) ## Version Current version: 0.0.0 ## License See LICENSE file for details.