diff options
Diffstat (limited to 'README.md')
| -rw-r--r-- | README.md | 999 |
1 files changed, 44 insertions, 955 deletions
@@ -4,993 +4,82 @@ # Epimetheus -A versatile Go tool for pushing metrics to Prometheus with support for both realtime and historic data ingestion. +A versatile Go tool for pushing metrics to Prometheus (and Prometheus-compatible backends like VictoriaMetrics) and ClickHouse, with support for realtime and historic data ingestion. ## Why "Epimetheus"? -In Greek mythology, [Epimetheus](https://en.wikipedia.org/wiki/Epimetheus_(mythology)) is Prometheus's brother, whose name means "afterthought" or "hindsight" (while Prometheus means "forethought"). This name cleverly captures the tool's purpose: bringing data to Prometheus **after** collection, whether it's historic data from hours, days, or weeks ago, or realtime data pushed on-demand. - -While Epimetheus is sometimes depicted as foolish in myths (he accepted Pandora's box despite warnings), this tool embraces the "afterthought" aspect productively - it's never too late to bring your metrics home to Prometheus! - -## Architecture - -``` -┌─────────────────────────────────────────────────────────────────────────┐ -│ Epimetheus │ -│ (Metrics Ingestion Tool) │ -│ │ -│ Modes: │ -│ • Realtime - Current metrics (< 5 min old) │ -│ • Historic - Historic metrics (≥ 5 min old) │ -│ • Backfill - Range of historic data │ -│ • Auto - Automatic routing based on timestamp age │ -└─────────────────────────────────────────────────────────────────────────┘ - │ │ - │ Realtime Data │ Historic Data - │ (via HTTP POST) │ (via Remote Write API) - │ Uses "now" timestamp │ Preserves timestamps - ▼ ▼ -┌─────────────────────┐ ┌─────────────────────┐ -│ Pushgateway │ │ Prometheus │ -│ (Port 9091) │ │ (Port 9090) │ -│ │ │ │ -│ • Buffers metrics │ │ Remote Write API: │ -│ • Scraped by │──── Scraped ─────▶ │ /api/v1/write │ -│ Prometheus │ every 15-30s │ │ -│ • No timestamp │ │ Feature Required: │ -│ preservation │ │ --enable-feature= │ -│ │ │ remote-write- │ -│ │ │ receiver │ -└─────────────────────┘ └─────────────────────┘ - │ - │ Prometheus Query API - │ /api/v1/query - ▼ - ┌─────────────────────┐ - │ Grafana │ - │ (Port 3000) │ - │ │ - │ • Prometheus as │ - │ datasource │ - │ • Dashboards: │ - │ - Epimetheus │ - │ Test Metrics │ - │ • Auto-refresh │ - └─────────────────────┘ -``` - -### Data Flow - -1. **Realtime Path** (for current data): - - Epimetheus → Pushgateway (HTTP POST) - - Prometheus scrapes Pushgateway periodically - - Timestamp = "now" when Prometheus scrapes - -2. **Historic Path** (for old data): - - Epimetheus → Prometheus Remote Write API (HTTP POST) - - Direct write to Prometheus TSDB - - Timestamp preserved from original data - -3. **Visualization**: - - Grafana queries Prometheus - - Displays metrics in dashboards - - Auto-refresh every 10 seconds +In Greek mythology, [Epimetheus](https://en.wikipedia.org/wiki/Epimetheus_(mythology)) is Prometheus's brother—"afterthought" or "hindsight" (while Prometheus means "forethought"). This tool brings data to Prometheus **after** collection: historic data from hours or days ago, or realtime data pushed on-demand. It's never too late to bring your metrics home. ## Overview -**epimetheus** is a standalone binary that: -- **Generates** realistic example metrics simulating production applications -- **Pushes** metrics via Pushgateway (realtime) or Remote Write API (historic) -- **Automatically detects** timestamp age and chooses the optimal ingestion method -- **Supports** multiple data formats (CSV, JSON) and all Prometheus metric types -- **Provides** Grafana dashboard for visualizing test metrics - -## Quick Start - -### 1. Deploy Pushgateway (one-time setup) - -The Pushgateway Helm chart is available in the [conf repository](https://codeberg.org/snonux/conf) at `f3s/pushgateway/helm-chart`. - -```bash -# Clone the conf repository if you haven't already -git clone https://codeberg.org/snonux/conf.git -cd conf/f3s/pushgateway/helm-chart - -# Deploy Pushgateway -helm upgrade --install pushgateway . -n monitoring --create-namespace -``` - -Alternatively, deploy Pushgateway using the official chart: - -```bash -helm repo add prometheus-community https://prometheus-community.github.io/helm-charts -helm install pushgateway prometheus-community/prometheus-pushgateway -n monitoring --create-namespace -``` - -### 2. Run in Realtime Mode - -```bash -# Port-forward Pushgateway -kubectl port-forward -n monitoring svc/pushgateway 9091:9091 & - -# Push test metrics continuously -cd /home/paul/git/conf/f3s/epimetheus -./epimetheus -mode=realtime -continuous -``` - -The binary pushes metrics every 15 seconds. Press Ctrl+C to stop. - -### 3. View Metrics - -```bash -# Pushgateway UI -open http://localhost:9091 - -# Prometheus UI -kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090:9090 & -open http://localhost:9090 -``` - -## Operating Modes - -### 👁️ Watch Mode -Monitor CSV files for changes and push metrics to Prometheus with file modification timestamps. - -**Works with ANY CSV format** - automatically detects numeric vs string columns and sanitizes names. - -**NEW: Automatic DNS Resolution** - IP addresses are automatically resolved to hostnames for better observability in Grafana. - -```bash -./epimetheus -mode=watch \ - -file=mydata.csv \ - -metric-name=myapp \ - -prometheus=http://localhost:9090/api/v1/write -``` - -**Features:** -- 🔍 **Format-agnostic**: Works with any tabular CSV structure -- 📊 **Automatic detection**: Numeric columns → metrics, String columns → labels -- 🏷️ **Name sanitization**: `min(potatoes)`, `avg(time)`, `p99(latency)` → valid metric names -- 🌐 **DNS Resolution**: IP addresses → hostnames (e.g., `10.50.52.61` → `foo.example.lan`) -- 💾 **Smart Caching**: In-memory cache prevents redundant DNS lookups -- ⏱️ **Timestamp preservation**: Uses file modification time -- 🔄 **Continuous monitoring**: Polls file every 1 second -- 💪 **Error resilient**: Continues watching despite failures -- 🎯 **Remote Write**: Pushes to Prometheus (preserves timestamps) - -**CSV Format:** -Works with any tabular CSV: -- First row: column headers (automatically sanitized) -- Subsequent rows: data values -- Column names can be anything: `min(x)`, `avg(y)`, `p99(latency)`, etc. - -**Example 1** - Web metrics: -```csv -avg(response_time),p99(latency),endpoint,method -45.2,120.5,/api/users,GET -52.1,135.8,/api/orders,POST -``` - -Generates: -```promql -web_avg_response_time{endpoint="/api/users",method="GET"} 45.2 -web_p99_latency{endpoint="/api/users",method="GET"} 120.5 -web_avg_response_time{endpoint="/api/orders",method="POST"} 52.1 -web_p99_latency{endpoint="/api/orders",method="POST"} 135.8 -``` - -**Example 2** - Food metrics: -```csv -min(potatoes),last(coke),avg(price),country,store_type -5.2,10.5,12.99,USA,grocery -3.8,8.2,9.99,Canada,convenience -``` - -Generates: -```promql -food_min_potatoes{country="USA",store_type="grocery"} 5.2 -food_last_coke{country="USA",store_type="grocery"} 10.5 -food_avg_price{country="USA",store_type="grocery"} 12.99 -# ... etc -``` - -Each row generates N samples (N = number of numeric columns). - -See [CSV-FORMAT-FLEXIBILITY.md](CSV-FORMAT-FLEXIBILITY.md) for more examples. - -**Options:** -- `-file` - CSV file to watch (required) -- `-metric-name` - Base metric name (required, e.g., `food`, `network`, `database`) -- `-prometheus` - Prometheus Remote Write URL (default: http://localhost:9090/api/v1/write) -- `-clickhouse` - ClickHouse HTTP URL (e.g. http://localhost:8123) to also ingest metrics -- `-clickhouse-table` - ClickHouse table name (default: epimetheus_metrics) -- `-job` - Job name for metrics (default: example_metrics_pusher) -- `-resolve-ip-labels` - Additional IP labels to resolve via DNS (default: ip is always resolved) - -**ClickHouse Support:** -Watch mode can ingest to ClickHouse in addition to (or instead of) Prometheus: - -```bash -# Ingest to both Prometheus and ClickHouse -./epimetheus -mode=watch -file=data.csv -metric-name=myapp \ - -prometheus=http://localhost:9090/api/v1/write \ - -clickhouse=http://localhost:8123 - -# ClickHouse only (use -prometheus= to disable Prometheus) -./epimetheus -mode=watch -file=test-data/watch-clickhouse-test.csv \ - -metric-name=watch_test -clickhouse=http://localhost:8123 -prometheus= - -# Verify data in ClickHouse -./verify-clickhouse.sh -``` - -**DNS Resolution:** -By default, the `ip` label is automatically resolved to a hostname. To resolve additional IP labels: - -```bash -./epimetheus -mode=watch \ - -file=network.csv \ - -metric-name=network \ - -resolve-ip-labels=source_ip,dest_ip -``` - -This will resolve: `ip` (default) + `source_ip` + `dest_ip` - -**Example:** -- Input: `ip="10.50.52.61"` -- Output: `ip="foo.example.lan"` -- Failed lookups: IP remains unchanged - -**Documentation:** -- [DNS-RESOLUTION-FEATURE.md](DNS-RESOLUTION-FEATURE.md) - Complete DNS resolution guide -- [CSV-FORMAT-FLEXIBILITY.md](CSV-FORMAT-FLEXIBILITY.md) - Works with ANY CSV format -- [DTAIL-METRICS-EXAMPLE.md](DTAIL-METRICS-EXAMPLE.md) - Detailed dtail.csv example - -### 🔄 Realtime Mode (Default) -Push current metrics to Pushgateway with "now" timestamp. - -```bash -./epimetheus -mode=realtime -continuous -``` - -**Options:** -- `-pushgateway` - Pushgateway URL (default: http://localhost:9091) -- `-job` - Job name (default: example_metrics_pusher) -- `-continuous` - Keep pushing every 15 seconds - -### ⏰ Historic Mode -Push a single datapoint from the past using Remote Write API. - -```bash -# Port-forward Prometheus -kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090:9090 & - -# Push data from 24 hours ago -./epimetheus -mode=historic -hours-ago=24 -``` - -**Options:** -- `-prometheus` - Prometheus URL (default: http://localhost:9090/api/v1/write) -- `-hours-ago` - Hours in the past (default: 24) - -### 📦 Backfill Mode -Import a range of historic data points. - -```bash -# Backfill last 48 hours with 1-hour intervals -./epimetheus -mode=backfill -start-hours=48 -end-hours=0 -interval=1 - -# Backfill last week with 6-hour intervals -./epimetheus -mode=backfill -start-hours=168 -end-hours=0 -interval=6 -``` - -**Options:** -- `-start-hours` - Start time in hours ago -- `-end-hours` - End time in hours ago (0 = now) -- `-interval` - Interval between points in hours - -### 🤖 Auto Mode (Recommended!) -Automatically detect timestamp age and route to the correct ingestion method. - -```bash -# Generate test data -./generate-test-data.sh - -# Import mixed current and historic data -./epimetheus -mode=auto -file=test-all-ages.csv -``` - -**Detection Logic:** -- Data < 5 minutes old → Pushgateway (realtime) -- Data ≥ 5 minutes old → Remote Write (historic) - -**Options:** -- `-file` - Input file path -- `-format` - Data format: csv or json (default: csv) -- `-pushgateway` - Pushgateway URL -- `-prometheus` - Prometheus Remote Write URL - -## Data Formats - -### CSV Format - -```csv -# Format: metric_name,labels,value,timestamp_ms -# Labels: key1=value1;key2=value2 -epimetheus_test_requests_total,instance=web1;env=prod,100,1767125148000 -epimetheus_test_temperature_celsius,instance=web2,22.5,1767038748000 - -# Timestamp is optional (uses "now" if omitted) -epimetheus_test_active_connections,instance=web3,42, -``` - -### JSON Format - -```json -[ - { - "metric": "epimetheus_test_requests_total", - "labels": {"instance": "web1", "env": "prod"}, - "value": 100, - "timestamp_ms": 1767125148000 - }, - { - "metric": "epimetheus_test_temperature_celsius", - "labels": {"instance": "web2"}, - "value": 22.5, - "timestamp_ms": 1767038748000 - } -] -``` - -## Test Metrics - -All generated metrics use the `epimetheus_test_` prefix to clearly identify them as test data. +Epimetheus is a standalone binary that: -### Counter: `epimetheus_test_requests_total` -- **Type:** Counter (monotonically increasing) -- **Description:** Total number of requests processed -- **Use case:** Counting total events, requests, errors +- Pushes metrics via **Pushgateway** (realtime) or **Remote Write API** (historic, watch) +- Optionally ingests to **ClickHouse** in watch mode +- Supports **Prometheus-compatible backends** (e.g. VictoriaMetrics) by using their Remote Write URL +- Offers modes: realtime, historic, backfill, auto, and watch (CSV file monitoring) +- Accepts CSV and JSON input and provides a Grafana dashboard for test metrics -### Gauge: `epimetheus_test_active_connections` -- **Type:** Gauge (can increase or decrease) -- **Description:** Current number of active connections (0-100) -- **Use case:** Current state measurements, capacity - -### Gauge: `epimetheus_test_temperature_celsius` -- **Type:** Gauge -- **Description:** Current temperature in Celsius (0-50°C) -- **Use case:** Environmental monitoring - -### Histogram: `epimetheus_test_request_duration_seconds` -- **Type:** Histogram (distribution) -- **Description:** Request duration distribution -- **Buckets:** 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10 seconds -- **Use case:** Latency measurements, SLO tracking - -### Labeled Counter: `epimetheus_test_jobs_processed_total` -- **Type:** Counter with labels -- **Description:** Jobs processed by type and status -- **Labels:** - - `job_type`: email, report, backup - - `status`: success, failed -- **Use case:** Categorized counting, multi-dimensional metrics - -## Grafana Dashboard - -A comprehensive dashboard is available showcasing all test metrics. - -### Dashboard Features - -- **8 Panels:** - 1. Request Rate (line graph) - 2. Total Requests (stat panel) - 3. Active Connections (gauge with thresholds) - 4. Temperature (gauge with thresholds) - 5. Request Duration Histogram (p50, p90, p99) - 6. Average Request Duration (stat) - 7. Jobs Processed by Type (bar gauge) - 8. Jobs Status Breakdown (table) - -- **Auto-refresh:** Every 10 seconds -- **Time range:** Last 15 minutes (customizable) -- **Dark theme optimized** - -### Deploy Dashboard - -#### Option 1: Helm/Kubernetes ConfigMap (Recommended) - -```bash -# Deploy via Kubernetes ConfigMap -kubectl apply -f ../prometheus/epimetheus-dashboard.yaml -``` - -The dashboard will be automatically discovered by Grafana. - -#### Option 2: Manual Import - -```bash -# Port-forward Grafana -kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80 - -# Open Grafana -open http://localhost:3000 - -# Go to Dashboards → Import → Upload grafana-dashboard.json -``` - -#### Option 3: Automated Script - -```bash -# Deploy via API -./deploy-dashboard.sh - -# Or with custom credentials -GRAFANA_URL="http://localhost:3000" \ -GRAFANA_USER="admin" \ -GRAFANA_PASSWORD="yourpassword" \ -./deploy-dashboard.sh -``` - -## Example Queries - -### Basic Queries - -```promql -# View total requests -epimetheus_test_requests_total - -# View request rate over last 5 minutes -rate(epimetheus_test_requests_total[5m]) - -# View current active connections -epimetheus_test_active_connections - -# View current temperature -epimetheus_test_temperature_celsius -``` - -### Histogram Queries - -```promql -# 95th percentile request duration -histogram_quantile(0.95, rate(epimetheus_test_request_duration_seconds_bucket[5m])) - -# 50th percentile (median) -histogram_quantile(0.50, rate(epimetheus_test_request_duration_seconds_bucket[5m])) - -# Average request duration -rate(epimetheus_test_request_duration_seconds_sum[5m]) / -rate(epimetheus_test_request_duration_seconds_count[5m]) -``` - -### Labeled Counter Queries - -```promql -# Failed jobs by type -epimetheus_test_jobs_processed_total{status="failed"} - -# Job success rate -rate(epimetheus_test_jobs_processed_total{status="success"}[5m]) / -rate(epimetheus_test_jobs_processed_total[5m]) - -# Total jobs by type -sum by (job_type) (epimetheus_test_jobs_processed_total) -``` - -### Curl Examples - -```bash -# Port-forward Prometheus -kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090:9090 & - -# Query total requests -curl -s "http://localhost:9090/api/v1/query?query=epimetheus_test_requests_total" | jq . - -# Query temperature -curl -s "http://localhost:9090/api/v1/query?query=epimetheus_test_temperature_celsius" | jq . - -# Query request rate -curl -s "http://localhost:9090/api/v1/query?query=rate(epimetheus_test_requests_total[5m])" | jq . - -# Query histogram p95 -curl -s "http://localhost:9090/api/v1/query?query=histogram_quantile(0.95,rate(epimetheus_test_request_duration_seconds_bucket[5m]))" | jq . -``` - -## Time Range Limitations - -### ✅ Supported Time Ranges - -| Time Range | Status | Method | -|------------|--------|--------| -| Current (< 5 min) | ✅ Works | Pushgateway | -| 1 hour old | ✅ Works | Remote Write | -| 1 day old | ✅ Works | Remote Write | -| 1 week old | ✅ Works | Remote Write | -| 1 month old | ✅ Works | Remote Write | - -### ⚠️ Potential Issues - -- **Future timestamps:** Rejected (> 5 minutes in future) -- **Very old data (6+ months):** May be rejected depending on Prometheus retention -- **Years old:** Likely rejected - use `promtool tsdb create-blocks-from` instead -- **Out-of-order samples:** Can't insert older data into existing time series (use different labels) - -### Prometheus Configuration - -Check your retention settings: - -```bash -# View retention -kubectl get prometheus -n monitoring prometheus-kube-prometheus-prometheus \ - -o jsonpath='{.spec.retention}' - -# Default is typically 15 days -``` - -For very old data: -- Increase retention in Prometheus config -- Enable out-of-order ingestion (experimental) -- Use `promtool` for direct TSDB block creation - -## Project Structure - -``` -epimetheus/ -├── cmd/ -│ └── epimetheus/ -│ └── main.go # Main entry point -├── internal/ -│ ├── config/ # Configuration -│ ├── metrics/ # Metric generators -│ ├── parser/ # CSV/JSON parsers (includes tabular CSV) -│ ├── ingester/ # Pushgateway & Remote Write ingesters -│ └── watcher/ # File watcher for watch mode -├── epimetheus # Compiled binary -├── grafana-dashboard.json # Grafana dashboard definition -├── deploy-dashboard.sh # Dashboard deployment script -├── generate-test-data.sh # Test data generator -├── run.sh # Helper script -└── README.md # This file -``` - -## Setup Requirements - -### 1. Enable Prometheus Remote Write Receiver ⚠️ **REQUIRED for Historic Data** - -**IMPORTANT**: To use historic mode, backfill mode, or auto mode with old data, you **must** enable the Prometheus Remote Write receiver. Without this feature, Epimetheus can only push realtime data via Pushgateway. - -The Remote Write receiver is configured in the [conf repository](https://codeberg.org/snonux/conf) at `f3s/prometheus/persistence-values.yaml`: - -```yaml -# In prometheus/persistence-values.yaml (from conf repository) -prometheus: - prometheusSpec: - # Enable Remote Write receiver endpoint and Admin API (Prometheus 3.x syntax) - additionalArgs: - - name: web.enable-remote-write-receiver - value: "" - - name: web.enable-admin-api - value: "" - - # Enable out-of-order ingestion for backfilling - # Allows writing data points older than existing data for the same time series - enableFeatures: - - exemplar-storage - - otlp-write-receiver - - # Allow backfilling up to 31 days in the past (provides 1-day buffer for 30-day datasets) - tsdb: - outOfOrderTimeWindow: 744h # 31 days -``` - -**What This Enables:** -- **Remote Write API**: HTTP endpoint at `/api/v1/write` for ingesting metrics with custom timestamps -- **Admin API**: HTTP endpoints at `/api/v1/admin/tsdb/*` for data deletion and management -- **Out-of-Order Ingestion**: Allows writing data points older than existing data for the same time series -- **31-Day Window**: Can backfill data up to 31 days in the past (provides 1-day buffer for 30-day datasets) - -After updating the configuration, upgrade your Prometheus installation: - -```bash -cd conf/f3s/prometheus -just upgrade # Or manually: -# helm upgrade prometheus prometheus-community/kube-prometheus-stack \ -# -n monitoring -f persistence-values.yaml -``` - -Verify the features are enabled: - -```bash -# Check Remote Write receiver flag -kubectl get pod -n monitoring prometheus-prometheus-kube-prometheus-prometheus-0 \ - -o jsonpath='{.spec.containers[0].args}' | grep -o "web.enable-remote-write-receiver" - -# Check out-of-order time window -kubectl get prometheus -n monitoring prometheus-kube-prometheus-prometheus \ - -o jsonpath='{.spec.tsdb.outOfOrderTimeWindow}' -# Should output: 744h - -# Check admin API flag -kubectl get pod -n monitoring prometheus-prometheus-kube-prometheus-prometheus-0 \ - -o jsonpath='{.spec.containers[0].args}' | grep -o "web.enable-admin-api" -``` - -**Performance Considerations:** - -This configuration is designed for ad-hoc troubleshooting and development, **NOT production use**. Enabling these features has trade-offs: - -- **Increased Memory Usage**: Out-of-order ingestion requires additional memory for buffering and sorting time series -- **Higher TSDB Overhead**: Prometheus TSDB needs to handle non-sequential writes, increasing disk I/O -- **Query Performance**: Queries may be slower due to fragmented data blocks -- **Storage Amplification**: Out-of-order samples can trigger additional compactions, increasing storage usage - -**Recommendation for Production:** -- Keep `outOfOrderTimeWindow` as small as possible (or disabled) -- Monitor Prometheus memory and disk usage closely -- Use Remote Write only when necessary -- Consider using dedicated testing/development Prometheus instances - -**Note**: The syntax changed in Prometheus 3.x - use `additionalArgs` with `web.enable-remote-write-receiver` instead of the deprecated `enableFeatures: [remote-write-receiver]`. - -### 2. Update Prometheus Scrape Config +## Quick Start -Ensure Pushgateway is in scrape targets: +1. **Build:** `mage build` or `go build -o epimetheus cmd/epimetheus/main.go` +2. **Realtime (Pushgateway):** Deploy Pushgateway and Prometheus, then run: + ```bash + ./epimetheus -mode=realtime -continuous + ``` +3. **Watch (Remote Write):** Enable [Remote Write receiver](docs/operations/setup-prometheus.md), then: + ```bash + ./epimetheus -mode=watch -file=mydata.csv -metric-name=myapp -prometheus=http://localhost:9090/api/v1/write + ``` +4. **View:** Prometheus at http://localhost:9090 (after port-forward if needed). For full steps see [Quick Start](docs/guides/quickstart.md). -```yaml -# additional-scrape-configs.yaml -- job_name: 'pushgateway' - honor_labels: true - static_configs: - - targets: - - 'pushgateway.monitoring.svc.cluster.local:9091' -``` +## Documentation -Apply the configuration: +Full documentation is in the [docs](docs/README.md) directory: -```bash -kubectl create secret generic additional-scrape-configs \ - --from-file=/home/paul/git/conf/f3s/prometheus/additional-scrape-configs.yaml \ - --dry-run=client -o yaml -n monitoring | kubectl apply -f - -``` +| Section | Description | +|---------|-------------| +| [Guides](docs/guides/quickstart.md) | [Quick Start](docs/guides/quickstart.md), [Modes](docs/guides/modes.md), [Data Formats](docs/guides/data-formats.md), [CSV flexibility](docs/guides/csv-format-flexibility.md), [DNS resolution](docs/guides/dns-resolution.md), [Dtail example](docs/guides/dtail-metrics-example.md) | +| [Backends](docs/backends/prometheus.md) | [Prometheus / VictoriaMetrics](docs/backends/prometheus.md), [ClickHouse](docs/backends/clickhouse.md) | +| [Operations](docs/operations/setup-prometheus.md) | [Setup Prometheus](docs/operations/setup-prometheus.md), [Setup ClickHouse](docs/operations/setup-clickhouse.md), [Troubleshooting](docs/operations/troubleshooting.md), [Cleanup](docs/operations/cleanup.md), [macOS](docs/operations/macos-setup.md), [Kubernetes](docs/operations/kubernetes.md) | +| [Reference](docs/reference/cli.md) | [CLI](docs/reference/cli.md), [Test metrics](docs/reference/test-metrics.md), [Grafana dashboard](docs/reference/grafana-dashboard.md), [Example queries](docs/reference/example-queries.md), [Magefile](docs/reference/magefile.md) | +| [Design](docs/design/architecture.md) | [Architecture](docs/design/architecture.md) | -## Building from Source +[Documentation index](docs/README.md) — complete list with one-line descriptions. -### Using Mage (Recommended) +## Building -This project includes a [Magefile](./MAGEFILE.md) for easy building, testing, and running: +**Using Mage (recommended):** ```bash -# Install Mage (one-time setup) go install github.com/magefile/mage@latest - -# Build binary mage build - -# Run tests mage test - -# Run with coverage report -mage testCoverage - -# Run in realtime mode -mage run - -# See all available targets -mage -l +mage run # realtime mode ``` -See [MAGEFILE.md](./MAGEFILE.md) for complete documentation. +See [Magefile reference](docs/reference/magefile.md) for all targets. -### Using Go directly +**Using Go:** ```bash -# Build binary go build -o epimetheus cmd/epimetheus/main.go - -# Run tests -go test ./... -v - -# Check test coverage -go test ./... -cover -``` - -## Troubleshooting - -### Binary can't connect to Pushgateway - -```bash -# Check port-forward is running -ps aux | grep "port-forward.*9091" - -# Restart port-forward -kubectl port-forward -n monitoring svc/pushgateway 9091:9091 -``` - -### Metrics not appearing in Prometheus - -```bash -# Check Pushgateway has metrics -curl http://localhost:9091/metrics | grep "prometheus_pusher_test" - -# Check Prometheus scrape targets -# Open http://localhost:9090/targets - look for "pushgateway" job - -# Check Prometheus logs -kubectl logs -n monitoring -l app.kubernetes.io/name=prometheus +go test ./... ``` -### "Remote write receiver not enabled" error - -```bash -# Verify feature is enabled -kubectl logs -n monitoring prometheus-prometheus-kube-prometheus-prometheus-0 | grep "remote-write-receiver" - -# Should see: msg="Experimental features enabled" features=[remote-write-receiver] -``` - -### "Out of order sample" error - -This occurs when trying to insert data older than existing data for the same time series. - -**Solutions:** -- Use different job labels for historic data (e.g., `job="historic_data"`) -- Enable out-of-order ingestion in Prometheus (experimental) -- Ensure backfill goes from oldest to newest - -### Dashboard not appearing in Grafana - -```bash -# Check ConfigMap exists -kubectl get configmap -n monitoring | grep epimetheus - -# Check labels -kubectl get configmap epimetheus-dashboard -n monitoring -o yaml | grep "grafana_dashboard" - -# Restart Grafana to force reload -kubectl rollout restart deployment/prometheus-grafana -n monitoring -``` - -## Architecture - -``` -┌─────────────────┐ -│ Go Binary │ -│ (prometheus- │──Push realtime──┐ -│ pusher) │ │ -└─────────────────┘ ▼ - │ ┌──────────────────┐ - │ │ Pushgateway │◄──Scrape──┐ - │ │ (Port 9091) │ │ - │ └──────────────────┘ │ - │ │ - └──Push historic──────────────────┐ │ - ▼ │ - ┌─────────────────┐ │ - │ Prometheus │◄────┘ - │ (Port 9090) │ - │ Remote Write API│ - └─────────────────┘ - │ - │ Datasource - ▼ - ┌─────────────────┐ - │ Grafana │ - │ (Port 3000) │ - │ Dashboards │ - └─────────────────┘ -``` - -## Best Practices - -### When to Use Pushgateway vs. Remote Write - -**Use Pushgateway (realtime mode):** -- Short-lived batch jobs -- Service-level metrics -- Jobs behind firewalls -- Current/recent data (< 5 minutes old) - -**Use Remote Write (historic mode):** -- Historic data import -- Backfilling gaps -- Data migration -- Data older than 5 minutes - -**Use Auto Mode:** -- Mixed current and historic data -- Importing from files -- Unknown timestamp ages -- General-purpose ingestion - -### Metric Design - -- **Use appropriate metric types:** - - Counter for cumulative values (requests, errors) - - Gauge for point-in-time values (temperature, connections) - - Histogram for distributions (latency, sizes) - -- **Label cardinality:** - - Include meaningful labels - - Avoid high-cardinality labels (user IDs, timestamps) - - Keep label combinations reasonable (< 1000 per metric) - -- **Naming conventions:** - - Use descriptive names - - Include units in gauge names (\_celsius, \_bytes) - - Use \_total suffix for counters - -## Cleanup - -### Cleaning Up Benchmark Data from Prometheus - -For cleaning up benchmark metrics from Prometheus, use the provided cleanup script: - -```bash -# Port-forward to Prometheus -kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090:9090 & - -# Run the cleanup script -./cleanup-benchmark-data.sh -``` - -The script will: -1. Delete all `epimetheus_benchmark_*` metrics using the Prometheus Admin API -2. Clean up tombstones to free disk space -3. Provide clear success/error feedback - -**Manual cleanup** (if you prefer): - -```bash -# Delete specific metric -curl -X POST 'http://localhost:9090/api/v1/admin/tsdb/delete_series?match[]=epimetheus_benchmark_cpu_usage' - -# Clean up tombstones -curl -X POST 'http://localhost:9090/api/v1/admin/tsdb/clean_tombstones' -``` - -### Other Cleanup Tasks - -```bash -# Stop port-forwards -pkill -f "port-forward.*9091" -pkill -f "port-forward.*9090" -pkill -f "port-forward.*3000" - -# Delete test metrics from Pushgateway -curl -X DELETE http://localhost:9091/metrics/job/example_metrics_pusher - -# Uninstall Pushgateway (if needed) -helm uninstall pushgateway -n monitoring -``` - -## MacOS Setup - -### Basic Installation - -```bash -brew install prometheus -brew install grafana -go install github.com/prometheus/pushgateway@latest -brew services start grafana -brew services start prometheus -~/go/bin/pushgateway & -``` - -Once done, login to http://localhost:3000 as admin:admin, you will be prompted to change the password. Afterwards, add http://localhost:9090 as a Prometheus datasource. - -### Enable Remote Write Receiver (Required for Watch Mode) - -⚠️ **Important**: Watch mode, historic mode, backfill mode, and auto mode require the Prometheus Remote Write receiver to be enabled. - -#### Option 1: Permanent Configuration (Recommended) - -Edit the Prometheus arguments file: - -```bash -# Edit the arguments file -nano /opt/homebrew/etc/prometheus.args -``` - -Add this line at the end: -``` ---web.enable-remote-write-receiver -``` - -The complete file should look like: -``` ---config.file /opt/homebrew/etc/prometheus.yml ---web.listen-address=127.0.0.1:9090 ---storage.tsdb.path /opt/homebrew/var/prometheus ---web.enable-remote-write-receiver ---web.enable-admin-api -``` - -**Note:** `--web.enable-admin-api` is optional but recommended for easier data management (allows deleting old metrics). - -Restart Prometheus: -```bash -brew services restart prometheus -``` - -Verify it's working: -```bash -# Check Prometheus is healthy -curl http://localhost:9090/-/healthy - -# Test Remote Write endpoint (should return 400, not 404) -curl -X POST http://localhost:9090/api/v1/write -``` - -#### Option 2: Temporary (For Testing) - -Stop the service and start manually: - -```bash -# Stop brew service -brew services stop prometheus - -# Start with Remote Write enabled -prometheus --web.enable-remote-write-receiver -``` - -Keep this terminal open. In another terminal, run your epimetheus commands. - -**Note**: This only lasts until you stop the terminal. Use Option 1 for permanent setup. - -### Clearing Old Metrics (Optional) - -If you need to delete old metrics and start fresh: - -```bash -# Delete specific metrics (e.g., blockstore) -curl -X POST -g 'http://localhost:9090/api/v1/admin/tsdb/delete_series?match[]={__name__=~"blockstore_.*"}' - -# Clean up deleted data -curl -X POST http://localhost:9090/api/v1/admin/tsdb/clean_tombstones - -# Wait a moment for cleanup -sleep 2 -``` - -**Note:** Admin API must be enabled (add `--web.enable-admin-api` to prometheus.args). - -### Verify Setup - -Once Remote Write is enabled, test watch mode: - -```bash -# Create a test CSV -cat > /tmp/test.csv << EOF -status,count,method -200,100,GET -404,50,POST -EOF - -# Watch the file -./epimetheus -mode=watch \ - -file=/tmp/test.csv \ - -metric-name=test \ - -prometheus=http://localhost:9090/api/v1/write -``` +## Project Structure -You should see: -``` -✅ Successfully pushed X samples to Prometheus ``` - -Query in Prometheus (http://localhost:9090): -```promql -{__name__=~"test_.*"} +epimetheus/ +├── cmd/epimetheus/ # Main entry point +├── internal/ # config, ingester, metrics, parser, resolver, watcher +├── docs/ # Documentation +├── scripts/ # Helper shell scripts (verify-clickhouse, generate-test-data, etc.) +├── test-data/ # Test CSVs +├── Magefile.go # Build and run targets +└── README.md ``` -## Additional Resources - -- [Prometheus Documentation](https://prometheus.io/docs/) -- [Pushgateway Documentation](https://github.com/prometheus/pushgateway) -- [Prometheus Remote Write Spec](https://prometheus.io/docs/concepts/remote_write_spec/) -- [Grafana Documentation](https://grafana.com/docs/) - ## Version Current version: 0.0.0 |
