summaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'README.md')
-rw-r--r--README.md999
1 files changed, 44 insertions, 955 deletions
diff --git a/README.md b/README.md
index ba10a76..10d7b97 100644
--- a/README.md
+++ b/README.md
@@ -4,993 +4,82 @@
# Epimetheus
-A versatile Go tool for pushing metrics to Prometheus with support for both realtime and historic data ingestion.
+A versatile Go tool for pushing metrics to Prometheus (and Prometheus-compatible backends like VictoriaMetrics) and ClickHouse, with support for realtime and historic data ingestion.
## Why "Epimetheus"?
-In Greek mythology, [Epimetheus](https://en.wikipedia.org/wiki/Epimetheus_(mythology)) is Prometheus's brother, whose name means "afterthought" or "hindsight" (while Prometheus means "forethought"). This name cleverly captures the tool's purpose: bringing data to Prometheus **after** collection, whether it's historic data from hours, days, or weeks ago, or realtime data pushed on-demand.
-
-While Epimetheus is sometimes depicted as foolish in myths (he accepted Pandora's box despite warnings), this tool embraces the "afterthought" aspect productively - it's never too late to bring your metrics home to Prometheus!
-
-## Architecture
-
-```
-┌─────────────────────────────────────────────────────────────────────────┐
-│ Epimetheus │
-│ (Metrics Ingestion Tool) │
-│ │
-│ Modes: │
-│ • Realtime - Current metrics (< 5 min old) │
-│ • Historic - Historic metrics (≥ 5 min old) │
-│ • Backfill - Range of historic data │
-│ • Auto - Automatic routing based on timestamp age │
-└─────────────────────────────────────────────────────────────────────────┘
- │ │
- │ Realtime Data │ Historic Data
- │ (via HTTP POST) │ (via Remote Write API)
- │ Uses "now" timestamp │ Preserves timestamps
- ▼ ▼
-┌─────────────────────┐ ┌─────────────────────┐
-│ Pushgateway │ │ Prometheus │
-│ (Port 9091) │ │ (Port 9090) │
-│ │ │ │
-│ • Buffers metrics │ │ Remote Write API: │
-│ • Scraped by │──── Scraped ─────▶ │ /api/v1/write │
-│ Prometheus │ every 15-30s │ │
-│ • No timestamp │ │ Feature Required: │
-│ preservation │ │ --enable-feature= │
-│ │ │ remote-write- │
-│ │ │ receiver │
-└─────────────────────┘ └─────────────────────┘
- │
- │ Prometheus Query API
- │ /api/v1/query
- ▼
- ┌─────────────────────┐
- │ Grafana │
- │ (Port 3000) │
- │ │
- │ • Prometheus as │
- │ datasource │
- │ • Dashboards: │
- │ - Epimetheus │
- │ Test Metrics │
- │ • Auto-refresh │
- └─────────────────────┘
-```
-
-### Data Flow
-
-1. **Realtime Path** (for current data):
- - Epimetheus → Pushgateway (HTTP POST)
- - Prometheus scrapes Pushgateway periodically
- - Timestamp = "now" when Prometheus scrapes
-
-2. **Historic Path** (for old data):
- - Epimetheus → Prometheus Remote Write API (HTTP POST)
- - Direct write to Prometheus TSDB
- - Timestamp preserved from original data
-
-3. **Visualization**:
- - Grafana queries Prometheus
- - Displays metrics in dashboards
- - Auto-refresh every 10 seconds
+In Greek mythology, [Epimetheus](https://en.wikipedia.org/wiki/Epimetheus_(mythology)) is Prometheus's brother—"afterthought" or "hindsight" (while Prometheus means "forethought"). This tool brings data to Prometheus **after** collection: historic data from hours or days ago, or realtime data pushed on-demand. It's never too late to bring your metrics home.
## Overview
-**epimetheus** is a standalone binary that:
-- **Generates** realistic example metrics simulating production applications
-- **Pushes** metrics via Pushgateway (realtime) or Remote Write API (historic)
-- **Automatically detects** timestamp age and chooses the optimal ingestion method
-- **Supports** multiple data formats (CSV, JSON) and all Prometheus metric types
-- **Provides** Grafana dashboard for visualizing test metrics
-
-## Quick Start
-
-### 1. Deploy Pushgateway (one-time setup)
-
-The Pushgateway Helm chart is available in the [conf repository](https://codeberg.org/snonux/conf) at `f3s/pushgateway/helm-chart`.
-
-```bash
-# Clone the conf repository if you haven't already
-git clone https://codeberg.org/snonux/conf.git
-cd conf/f3s/pushgateway/helm-chart
-
-# Deploy Pushgateway
-helm upgrade --install pushgateway . -n monitoring --create-namespace
-```
-
-Alternatively, deploy Pushgateway using the official chart:
-
-```bash
-helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
-helm install pushgateway prometheus-community/prometheus-pushgateway -n monitoring --create-namespace
-```
-
-### 2. Run in Realtime Mode
-
-```bash
-# Port-forward Pushgateway
-kubectl port-forward -n monitoring svc/pushgateway 9091:9091 &
-
-# Push test metrics continuously
-cd /home/paul/git/conf/f3s/epimetheus
-./epimetheus -mode=realtime -continuous
-```
-
-The binary pushes metrics every 15 seconds. Press Ctrl+C to stop.
-
-### 3. View Metrics
-
-```bash
-# Pushgateway UI
-open http://localhost:9091
-
-# Prometheus UI
-kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090:9090 &
-open http://localhost:9090
-```
-
-## Operating Modes
-
-### 👁️ Watch Mode
-Monitor CSV files for changes and push metrics to Prometheus with file modification timestamps.
-
-**Works with ANY CSV format** - automatically detects numeric vs string columns and sanitizes names.
-
-**NEW: Automatic DNS Resolution** - IP addresses are automatically resolved to hostnames for better observability in Grafana.
-
-```bash
-./epimetheus -mode=watch \
- -file=mydata.csv \
- -metric-name=myapp \
- -prometheus=http://localhost:9090/api/v1/write
-```
-
-**Features:**
-- 🔍 **Format-agnostic**: Works with any tabular CSV structure
-- 📊 **Automatic detection**: Numeric columns → metrics, String columns → labels
-- 🏷️ **Name sanitization**: `min(potatoes)`, `avg(time)`, `p99(latency)` → valid metric names
-- 🌐 **DNS Resolution**: IP addresses → hostnames (e.g., `10.50.52.61` → `foo.example.lan`)
-- 💾 **Smart Caching**: In-memory cache prevents redundant DNS lookups
-- ⏱️ **Timestamp preservation**: Uses file modification time
-- 🔄 **Continuous monitoring**: Polls file every 1 second
-- 💪 **Error resilient**: Continues watching despite failures
-- 🎯 **Remote Write**: Pushes to Prometheus (preserves timestamps)
-
-**CSV Format:**
-Works with any tabular CSV:
-- First row: column headers (automatically sanitized)
-- Subsequent rows: data values
-- Column names can be anything: `min(x)`, `avg(y)`, `p99(latency)`, etc.
-
-**Example 1** - Web metrics:
-```csv
-avg(response_time),p99(latency),endpoint,method
-45.2,120.5,/api/users,GET
-52.1,135.8,/api/orders,POST
-```
-
-Generates:
-```promql
-web_avg_response_time{endpoint="/api/users",method="GET"} 45.2
-web_p99_latency{endpoint="/api/users",method="GET"} 120.5
-web_avg_response_time{endpoint="/api/orders",method="POST"} 52.1
-web_p99_latency{endpoint="/api/orders",method="POST"} 135.8
-```
-
-**Example 2** - Food metrics:
-```csv
-min(potatoes),last(coke),avg(price),country,store_type
-5.2,10.5,12.99,USA,grocery
-3.8,8.2,9.99,Canada,convenience
-```
-
-Generates:
-```promql
-food_min_potatoes{country="USA",store_type="grocery"} 5.2
-food_last_coke{country="USA",store_type="grocery"} 10.5
-food_avg_price{country="USA",store_type="grocery"} 12.99
-# ... etc
-```
-
-Each row generates N samples (N = number of numeric columns).
-
-See [CSV-FORMAT-FLEXIBILITY.md](CSV-FORMAT-FLEXIBILITY.md) for more examples.
-
-**Options:**
-- `-file` - CSV file to watch (required)
-- `-metric-name` - Base metric name (required, e.g., `food`, `network`, `database`)
-- `-prometheus` - Prometheus Remote Write URL (default: http://localhost:9090/api/v1/write)
-- `-clickhouse` - ClickHouse HTTP URL (e.g. http://localhost:8123) to also ingest metrics
-- `-clickhouse-table` - ClickHouse table name (default: epimetheus_metrics)
-- `-job` - Job name for metrics (default: example_metrics_pusher)
-- `-resolve-ip-labels` - Additional IP labels to resolve via DNS (default: ip is always resolved)
-
-**ClickHouse Support:**
-Watch mode can ingest to ClickHouse in addition to (or instead of) Prometheus:
-
-```bash
-# Ingest to both Prometheus and ClickHouse
-./epimetheus -mode=watch -file=data.csv -metric-name=myapp \
- -prometheus=http://localhost:9090/api/v1/write \
- -clickhouse=http://localhost:8123
-
-# ClickHouse only (use -prometheus= to disable Prometheus)
-./epimetheus -mode=watch -file=test-data/watch-clickhouse-test.csv \
- -metric-name=watch_test -clickhouse=http://localhost:8123 -prometheus=
-
-# Verify data in ClickHouse
-./verify-clickhouse.sh
-```
-
-**DNS Resolution:**
-By default, the `ip` label is automatically resolved to a hostname. To resolve additional IP labels:
-
-```bash
-./epimetheus -mode=watch \
- -file=network.csv \
- -metric-name=network \
- -resolve-ip-labels=source_ip,dest_ip
-```
-
-This will resolve: `ip` (default) + `source_ip` + `dest_ip`
-
-**Example:**
-- Input: `ip="10.50.52.61"`
-- Output: `ip="foo.example.lan"`
-- Failed lookups: IP remains unchanged
-
-**Documentation:**
-- [DNS-RESOLUTION-FEATURE.md](DNS-RESOLUTION-FEATURE.md) - Complete DNS resolution guide
-- [CSV-FORMAT-FLEXIBILITY.md](CSV-FORMAT-FLEXIBILITY.md) - Works with ANY CSV format
-- [DTAIL-METRICS-EXAMPLE.md](DTAIL-METRICS-EXAMPLE.md) - Detailed dtail.csv example
-
-### 🔄 Realtime Mode (Default)
-Push current metrics to Pushgateway with "now" timestamp.
-
-```bash
-./epimetheus -mode=realtime -continuous
-```
-
-**Options:**
-- `-pushgateway` - Pushgateway URL (default: http://localhost:9091)
-- `-job` - Job name (default: example_metrics_pusher)
-- `-continuous` - Keep pushing every 15 seconds
-
-### ⏰ Historic Mode
-Push a single datapoint from the past using Remote Write API.
-
-```bash
-# Port-forward Prometheus
-kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090:9090 &
-
-# Push data from 24 hours ago
-./epimetheus -mode=historic -hours-ago=24
-```
-
-**Options:**
-- `-prometheus` - Prometheus URL (default: http://localhost:9090/api/v1/write)
-- `-hours-ago` - Hours in the past (default: 24)
-
-### 📦 Backfill Mode
-Import a range of historic data points.
-
-```bash
-# Backfill last 48 hours with 1-hour intervals
-./epimetheus -mode=backfill -start-hours=48 -end-hours=0 -interval=1
-
-# Backfill last week with 6-hour intervals
-./epimetheus -mode=backfill -start-hours=168 -end-hours=0 -interval=6
-```
-
-**Options:**
-- `-start-hours` - Start time in hours ago
-- `-end-hours` - End time in hours ago (0 = now)
-- `-interval` - Interval between points in hours
-
-### 🤖 Auto Mode (Recommended!)
-Automatically detect timestamp age and route to the correct ingestion method.
-
-```bash
-# Generate test data
-./generate-test-data.sh
-
-# Import mixed current and historic data
-./epimetheus -mode=auto -file=test-all-ages.csv
-```
-
-**Detection Logic:**
-- Data < 5 minutes old → Pushgateway (realtime)
-- Data ≥ 5 minutes old → Remote Write (historic)
-
-**Options:**
-- `-file` - Input file path
-- `-format` - Data format: csv or json (default: csv)
-- `-pushgateway` - Pushgateway URL
-- `-prometheus` - Prometheus Remote Write URL
-
-## Data Formats
-
-### CSV Format
-
-```csv
-# Format: metric_name,labels,value,timestamp_ms
-# Labels: key1=value1;key2=value2
-epimetheus_test_requests_total,instance=web1;env=prod,100,1767125148000
-epimetheus_test_temperature_celsius,instance=web2,22.5,1767038748000
-
-# Timestamp is optional (uses "now" if omitted)
-epimetheus_test_active_connections,instance=web3,42,
-```
-
-### JSON Format
-
-```json
-[
- {
- "metric": "epimetheus_test_requests_total",
- "labels": {"instance": "web1", "env": "prod"},
- "value": 100,
- "timestamp_ms": 1767125148000
- },
- {
- "metric": "epimetheus_test_temperature_celsius",
- "labels": {"instance": "web2"},
- "value": 22.5,
- "timestamp_ms": 1767038748000
- }
-]
-```
-
-## Test Metrics
-
-All generated metrics use the `epimetheus_test_` prefix to clearly identify them as test data.
+Epimetheus is a standalone binary that:
-### Counter: `epimetheus_test_requests_total`
-- **Type:** Counter (monotonically increasing)
-- **Description:** Total number of requests processed
-- **Use case:** Counting total events, requests, errors
+- Pushes metrics via **Pushgateway** (realtime) or **Remote Write API** (historic, watch)
+- Optionally ingests to **ClickHouse** in watch mode
+- Supports **Prometheus-compatible backends** (e.g. VictoriaMetrics) by using their Remote Write URL
+- Offers modes: realtime, historic, backfill, auto, and watch (CSV file monitoring)
+- Accepts CSV and JSON input and provides a Grafana dashboard for test metrics
-### Gauge: `epimetheus_test_active_connections`
-- **Type:** Gauge (can increase or decrease)
-- **Description:** Current number of active connections (0-100)
-- **Use case:** Current state measurements, capacity
-
-### Gauge: `epimetheus_test_temperature_celsius`
-- **Type:** Gauge
-- **Description:** Current temperature in Celsius (0-50°C)
-- **Use case:** Environmental monitoring
-
-### Histogram: `epimetheus_test_request_duration_seconds`
-- **Type:** Histogram (distribution)
-- **Description:** Request duration distribution
-- **Buckets:** 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10 seconds
-- **Use case:** Latency measurements, SLO tracking
-
-### Labeled Counter: `epimetheus_test_jobs_processed_total`
-- **Type:** Counter with labels
-- **Description:** Jobs processed by type and status
-- **Labels:**
- - `job_type`: email, report, backup
- - `status`: success, failed
-- **Use case:** Categorized counting, multi-dimensional metrics
-
-## Grafana Dashboard
-
-A comprehensive dashboard is available showcasing all test metrics.
-
-### Dashboard Features
-
-- **8 Panels:**
- 1. Request Rate (line graph)
- 2. Total Requests (stat panel)
- 3. Active Connections (gauge with thresholds)
- 4. Temperature (gauge with thresholds)
- 5. Request Duration Histogram (p50, p90, p99)
- 6. Average Request Duration (stat)
- 7. Jobs Processed by Type (bar gauge)
- 8. Jobs Status Breakdown (table)
-
-- **Auto-refresh:** Every 10 seconds
-- **Time range:** Last 15 minutes (customizable)
-- **Dark theme optimized**
-
-### Deploy Dashboard
-
-#### Option 1: Helm/Kubernetes ConfigMap (Recommended)
-
-```bash
-# Deploy via Kubernetes ConfigMap
-kubectl apply -f ../prometheus/epimetheus-dashboard.yaml
-```
-
-The dashboard will be automatically discovered by Grafana.
-
-#### Option 2: Manual Import
-
-```bash
-# Port-forward Grafana
-kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80
-
-# Open Grafana
-open http://localhost:3000
-
-# Go to Dashboards → Import → Upload grafana-dashboard.json
-```
-
-#### Option 3: Automated Script
-
-```bash
-# Deploy via API
-./deploy-dashboard.sh
-
-# Or with custom credentials
-GRAFANA_URL="http://localhost:3000" \
-GRAFANA_USER="admin" \
-GRAFANA_PASSWORD="yourpassword" \
-./deploy-dashboard.sh
-```
-
-## Example Queries
-
-### Basic Queries
-
-```promql
-# View total requests
-epimetheus_test_requests_total
-
-# View request rate over last 5 minutes
-rate(epimetheus_test_requests_total[5m])
-
-# View current active connections
-epimetheus_test_active_connections
-
-# View current temperature
-epimetheus_test_temperature_celsius
-```
-
-### Histogram Queries
-
-```promql
-# 95th percentile request duration
-histogram_quantile(0.95, rate(epimetheus_test_request_duration_seconds_bucket[5m]))
-
-# 50th percentile (median)
-histogram_quantile(0.50, rate(epimetheus_test_request_duration_seconds_bucket[5m]))
-
-# Average request duration
-rate(epimetheus_test_request_duration_seconds_sum[5m]) /
-rate(epimetheus_test_request_duration_seconds_count[5m])
-```
-
-### Labeled Counter Queries
-
-```promql
-# Failed jobs by type
-epimetheus_test_jobs_processed_total{status="failed"}
-
-# Job success rate
-rate(epimetheus_test_jobs_processed_total{status="success"}[5m]) /
-rate(epimetheus_test_jobs_processed_total[5m])
-
-# Total jobs by type
-sum by (job_type) (epimetheus_test_jobs_processed_total)
-```
-
-### Curl Examples
-
-```bash
-# Port-forward Prometheus
-kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090:9090 &
-
-# Query total requests
-curl -s "http://localhost:9090/api/v1/query?query=epimetheus_test_requests_total" | jq .
-
-# Query temperature
-curl -s "http://localhost:9090/api/v1/query?query=epimetheus_test_temperature_celsius" | jq .
-
-# Query request rate
-curl -s "http://localhost:9090/api/v1/query?query=rate(epimetheus_test_requests_total[5m])" | jq .
-
-# Query histogram p95
-curl -s "http://localhost:9090/api/v1/query?query=histogram_quantile(0.95,rate(epimetheus_test_request_duration_seconds_bucket[5m]))" | jq .
-```
-
-## Time Range Limitations
-
-### ✅ Supported Time Ranges
-
-| Time Range | Status | Method |
-|------------|--------|--------|
-| Current (< 5 min) | ✅ Works | Pushgateway |
-| 1 hour old | ✅ Works | Remote Write |
-| 1 day old | ✅ Works | Remote Write |
-| 1 week old | ✅ Works | Remote Write |
-| 1 month old | ✅ Works | Remote Write |
-
-### ⚠️ Potential Issues
-
-- **Future timestamps:** Rejected (> 5 minutes in future)
-- **Very old data (6+ months):** May be rejected depending on Prometheus retention
-- **Years old:** Likely rejected - use `promtool tsdb create-blocks-from` instead
-- **Out-of-order samples:** Can't insert older data into existing time series (use different labels)
-
-### Prometheus Configuration
-
-Check your retention settings:
-
-```bash
-# View retention
-kubectl get prometheus -n monitoring prometheus-kube-prometheus-prometheus \
- -o jsonpath='{.spec.retention}'
-
-# Default is typically 15 days
-```
-
-For very old data:
-- Increase retention in Prometheus config
-- Enable out-of-order ingestion (experimental)
-- Use `promtool` for direct TSDB block creation
-
-## Project Structure
-
-```
-epimetheus/
-├── cmd/
-│ └── epimetheus/
-│ └── main.go # Main entry point
-├── internal/
-│ ├── config/ # Configuration
-│ ├── metrics/ # Metric generators
-│ ├── parser/ # CSV/JSON parsers (includes tabular CSV)
-│ ├── ingester/ # Pushgateway & Remote Write ingesters
-│ └── watcher/ # File watcher for watch mode
-├── epimetheus # Compiled binary
-├── grafana-dashboard.json # Grafana dashboard definition
-├── deploy-dashboard.sh # Dashboard deployment script
-├── generate-test-data.sh # Test data generator
-├── run.sh # Helper script
-└── README.md # This file
-```
-
-## Setup Requirements
-
-### 1. Enable Prometheus Remote Write Receiver ⚠️ **REQUIRED for Historic Data**
-
-**IMPORTANT**: To use historic mode, backfill mode, or auto mode with old data, you **must** enable the Prometheus Remote Write receiver. Without this feature, Epimetheus can only push realtime data via Pushgateway.
-
-The Remote Write receiver is configured in the [conf repository](https://codeberg.org/snonux/conf) at `f3s/prometheus/persistence-values.yaml`:
-
-```yaml
-# In prometheus/persistence-values.yaml (from conf repository)
-prometheus:
- prometheusSpec:
- # Enable Remote Write receiver endpoint and Admin API (Prometheus 3.x syntax)
- additionalArgs:
- - name: web.enable-remote-write-receiver
- value: ""
- - name: web.enable-admin-api
- value: ""
-
- # Enable out-of-order ingestion for backfilling
- # Allows writing data points older than existing data for the same time series
- enableFeatures:
- - exemplar-storage
- - otlp-write-receiver
-
- # Allow backfilling up to 31 days in the past (provides 1-day buffer for 30-day datasets)
- tsdb:
- outOfOrderTimeWindow: 744h # 31 days
-```
-
-**What This Enables:**
-- **Remote Write API**: HTTP endpoint at `/api/v1/write` for ingesting metrics with custom timestamps
-- **Admin API**: HTTP endpoints at `/api/v1/admin/tsdb/*` for data deletion and management
-- **Out-of-Order Ingestion**: Allows writing data points older than existing data for the same time series
-- **31-Day Window**: Can backfill data up to 31 days in the past (provides 1-day buffer for 30-day datasets)
-
-After updating the configuration, upgrade your Prometheus installation:
-
-```bash
-cd conf/f3s/prometheus
-just upgrade # Or manually:
-# helm upgrade prometheus prometheus-community/kube-prometheus-stack \
-# -n monitoring -f persistence-values.yaml
-```
-
-Verify the features are enabled:
-
-```bash
-# Check Remote Write receiver flag
-kubectl get pod -n monitoring prometheus-prometheus-kube-prometheus-prometheus-0 \
- -o jsonpath='{.spec.containers[0].args}' | grep -o "web.enable-remote-write-receiver"
-
-# Check out-of-order time window
-kubectl get prometheus -n monitoring prometheus-kube-prometheus-prometheus \
- -o jsonpath='{.spec.tsdb.outOfOrderTimeWindow}'
-# Should output: 744h
-
-# Check admin API flag
-kubectl get pod -n monitoring prometheus-prometheus-kube-prometheus-prometheus-0 \
- -o jsonpath='{.spec.containers[0].args}' | grep -o "web.enable-admin-api"
-```
-
-**Performance Considerations:**
-
-This configuration is designed for ad-hoc troubleshooting and development, **NOT production use**. Enabling these features has trade-offs:
-
-- **Increased Memory Usage**: Out-of-order ingestion requires additional memory for buffering and sorting time series
-- **Higher TSDB Overhead**: Prometheus TSDB needs to handle non-sequential writes, increasing disk I/O
-- **Query Performance**: Queries may be slower due to fragmented data blocks
-- **Storage Amplification**: Out-of-order samples can trigger additional compactions, increasing storage usage
-
-**Recommendation for Production:**
-- Keep `outOfOrderTimeWindow` as small as possible (or disabled)
-- Monitor Prometheus memory and disk usage closely
-- Use Remote Write only when necessary
-- Consider using dedicated testing/development Prometheus instances
-
-**Note**: The syntax changed in Prometheus 3.x - use `additionalArgs` with `web.enable-remote-write-receiver` instead of the deprecated `enableFeatures: [remote-write-receiver]`.
-
-### 2. Update Prometheus Scrape Config
+## Quick Start
-Ensure Pushgateway is in scrape targets:
+1. **Build:** `mage build` or `go build -o epimetheus cmd/epimetheus/main.go`
+2. **Realtime (Pushgateway):** Deploy Pushgateway and Prometheus, then run:
+ ```bash
+ ./epimetheus -mode=realtime -continuous
+ ```
+3. **Watch (Remote Write):** Enable [Remote Write receiver](docs/operations/setup-prometheus.md), then:
+ ```bash
+ ./epimetheus -mode=watch -file=mydata.csv -metric-name=myapp -prometheus=http://localhost:9090/api/v1/write
+ ```
+4. **View:** Prometheus at http://localhost:9090 (after port-forward if needed). For full steps see [Quick Start](docs/guides/quickstart.md).
-```yaml
-# additional-scrape-configs.yaml
-- job_name: 'pushgateway'
- honor_labels: true
- static_configs:
- - targets:
- - 'pushgateway.monitoring.svc.cluster.local:9091'
-```
+## Documentation
-Apply the configuration:
+Full documentation is in the [docs](docs/README.md) directory:
-```bash
-kubectl create secret generic additional-scrape-configs \
- --from-file=/home/paul/git/conf/f3s/prometheus/additional-scrape-configs.yaml \
- --dry-run=client -o yaml -n monitoring | kubectl apply -f -
-```
+| Section | Description |
+|---------|-------------|
+| [Guides](docs/guides/quickstart.md) | [Quick Start](docs/guides/quickstart.md), [Modes](docs/guides/modes.md), [Data Formats](docs/guides/data-formats.md), [CSV flexibility](docs/guides/csv-format-flexibility.md), [DNS resolution](docs/guides/dns-resolution.md), [Dtail example](docs/guides/dtail-metrics-example.md) |
+| [Backends](docs/backends/prometheus.md) | [Prometheus / VictoriaMetrics](docs/backends/prometheus.md), [ClickHouse](docs/backends/clickhouse.md) |
+| [Operations](docs/operations/setup-prometheus.md) | [Setup Prometheus](docs/operations/setup-prometheus.md), [Setup ClickHouse](docs/operations/setup-clickhouse.md), [Troubleshooting](docs/operations/troubleshooting.md), [Cleanup](docs/operations/cleanup.md), [macOS](docs/operations/macos-setup.md), [Kubernetes](docs/operations/kubernetes.md) |
+| [Reference](docs/reference/cli.md) | [CLI](docs/reference/cli.md), [Test metrics](docs/reference/test-metrics.md), [Grafana dashboard](docs/reference/grafana-dashboard.md), [Example queries](docs/reference/example-queries.md), [Magefile](docs/reference/magefile.md) |
+| [Design](docs/design/architecture.md) | [Architecture](docs/design/architecture.md) |
-## Building from Source
+[Documentation index](docs/README.md) — complete list with one-line descriptions.
-### Using Mage (Recommended)
+## Building
-This project includes a [Magefile](./MAGEFILE.md) for easy building, testing, and running:
+**Using Mage (recommended):**
```bash
-# Install Mage (one-time setup)
go install github.com/magefile/mage@latest
-
-# Build binary
mage build
-
-# Run tests
mage test
-
-# Run with coverage report
-mage testCoverage
-
-# Run in realtime mode
-mage run
-
-# See all available targets
-mage -l
+mage run # realtime mode
```
-See [MAGEFILE.md](./MAGEFILE.md) for complete documentation.
+See [Magefile reference](docs/reference/magefile.md) for all targets.
-### Using Go directly
+**Using Go:**
```bash
-# Build binary
go build -o epimetheus cmd/epimetheus/main.go
-
-# Run tests
-go test ./... -v
-
-# Check test coverage
-go test ./... -cover
-```
-
-## Troubleshooting
-
-### Binary can't connect to Pushgateway
-
-```bash
-# Check port-forward is running
-ps aux | grep "port-forward.*9091"
-
-# Restart port-forward
-kubectl port-forward -n monitoring svc/pushgateway 9091:9091
-```
-
-### Metrics not appearing in Prometheus
-
-```bash
-# Check Pushgateway has metrics
-curl http://localhost:9091/metrics | grep "prometheus_pusher_test"
-
-# Check Prometheus scrape targets
-# Open http://localhost:9090/targets - look for "pushgateway" job
-
-# Check Prometheus logs
-kubectl logs -n monitoring -l app.kubernetes.io/name=prometheus
+go test ./...
```
-### "Remote write receiver not enabled" error
-
-```bash
-# Verify feature is enabled
-kubectl logs -n monitoring prometheus-prometheus-kube-prometheus-prometheus-0 | grep "remote-write-receiver"
-
-# Should see: msg="Experimental features enabled" features=[remote-write-receiver]
-```
-
-### "Out of order sample" error
-
-This occurs when trying to insert data older than existing data for the same time series.
-
-**Solutions:**
-- Use different job labels for historic data (e.g., `job="historic_data"`)
-- Enable out-of-order ingestion in Prometheus (experimental)
-- Ensure backfill goes from oldest to newest
-
-### Dashboard not appearing in Grafana
-
-```bash
-# Check ConfigMap exists
-kubectl get configmap -n monitoring | grep epimetheus
-
-# Check labels
-kubectl get configmap epimetheus-dashboard -n monitoring -o yaml | grep "grafana_dashboard"
-
-# Restart Grafana to force reload
-kubectl rollout restart deployment/prometheus-grafana -n monitoring
-```
-
-## Architecture
-
-```
-┌─────────────────┐
-│ Go Binary │
-│ (prometheus- │──Push realtime──┐
-│ pusher) │ │
-└─────────────────┘ ▼
- │ ┌──────────────────┐
- │ │ Pushgateway │◄──Scrape──┐
- │ │ (Port 9091) │ │
- │ └──────────────────┘ │
- │ │
- └──Push historic──────────────────┐ │
- ▼ │
- ┌─────────────────┐ │
- │ Prometheus │◄────┘
- │ (Port 9090) │
- │ Remote Write API│
- └─────────────────┘
- │
- │ Datasource
- ▼
- ┌─────────────────┐
- │ Grafana │
- │ (Port 3000) │
- │ Dashboards │
- └─────────────────┘
-```
-
-## Best Practices
-
-### When to Use Pushgateway vs. Remote Write
-
-**Use Pushgateway (realtime mode):**
-- Short-lived batch jobs
-- Service-level metrics
-- Jobs behind firewalls
-- Current/recent data (< 5 minutes old)
-
-**Use Remote Write (historic mode):**
-- Historic data import
-- Backfilling gaps
-- Data migration
-- Data older than 5 minutes
-
-**Use Auto Mode:**
-- Mixed current and historic data
-- Importing from files
-- Unknown timestamp ages
-- General-purpose ingestion
-
-### Metric Design
-
-- **Use appropriate metric types:**
- - Counter for cumulative values (requests, errors)
- - Gauge for point-in-time values (temperature, connections)
- - Histogram for distributions (latency, sizes)
-
-- **Label cardinality:**
- - Include meaningful labels
- - Avoid high-cardinality labels (user IDs, timestamps)
- - Keep label combinations reasonable (< 1000 per metric)
-
-- **Naming conventions:**
- - Use descriptive names
- - Include units in gauge names (\_celsius, \_bytes)
- - Use \_total suffix for counters
-
-## Cleanup
-
-### Cleaning Up Benchmark Data from Prometheus
-
-For cleaning up benchmark metrics from Prometheus, use the provided cleanup script:
-
-```bash
-# Port-forward to Prometheus
-kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090:9090 &
-
-# Run the cleanup script
-./cleanup-benchmark-data.sh
-```
-
-The script will:
-1. Delete all `epimetheus_benchmark_*` metrics using the Prometheus Admin API
-2. Clean up tombstones to free disk space
-3. Provide clear success/error feedback
-
-**Manual cleanup** (if you prefer):
-
-```bash
-# Delete specific metric
-curl -X POST 'http://localhost:9090/api/v1/admin/tsdb/delete_series?match[]=epimetheus_benchmark_cpu_usage'
-
-# Clean up tombstones
-curl -X POST 'http://localhost:9090/api/v1/admin/tsdb/clean_tombstones'
-```
-
-### Other Cleanup Tasks
-
-```bash
-# Stop port-forwards
-pkill -f "port-forward.*9091"
-pkill -f "port-forward.*9090"
-pkill -f "port-forward.*3000"
-
-# Delete test metrics from Pushgateway
-curl -X DELETE http://localhost:9091/metrics/job/example_metrics_pusher
-
-# Uninstall Pushgateway (if needed)
-helm uninstall pushgateway -n monitoring
-```
-
-## MacOS Setup
-
-### Basic Installation
-
-```bash
-brew install prometheus
-brew install grafana
-go install github.com/prometheus/pushgateway@latest
-brew services start grafana
-brew services start prometheus
-~/go/bin/pushgateway &
-```
-
-Once done, login to http://localhost:3000 as admin:admin, you will be prompted to change the password. Afterwards, add http://localhost:9090 as a Prometheus datasource.
-
-### Enable Remote Write Receiver (Required for Watch Mode)
-
-⚠️ **Important**: Watch mode, historic mode, backfill mode, and auto mode require the Prometheus Remote Write receiver to be enabled.
-
-#### Option 1: Permanent Configuration (Recommended)
-
-Edit the Prometheus arguments file:
-
-```bash
-# Edit the arguments file
-nano /opt/homebrew/etc/prometheus.args
-```
-
-Add this line at the end:
-```
---web.enable-remote-write-receiver
-```
-
-The complete file should look like:
-```
---config.file /opt/homebrew/etc/prometheus.yml
---web.listen-address=127.0.0.1:9090
---storage.tsdb.path /opt/homebrew/var/prometheus
---web.enable-remote-write-receiver
---web.enable-admin-api
-```
-
-**Note:** `--web.enable-admin-api` is optional but recommended for easier data management (allows deleting old metrics).
-
-Restart Prometheus:
-```bash
-brew services restart prometheus
-```
-
-Verify it's working:
-```bash
-# Check Prometheus is healthy
-curl http://localhost:9090/-/healthy
-
-# Test Remote Write endpoint (should return 400, not 404)
-curl -X POST http://localhost:9090/api/v1/write
-```
-
-#### Option 2: Temporary (For Testing)
-
-Stop the service and start manually:
-
-```bash
-# Stop brew service
-brew services stop prometheus
-
-# Start with Remote Write enabled
-prometheus --web.enable-remote-write-receiver
-```
-
-Keep this terminal open. In another terminal, run your epimetheus commands.
-
-**Note**: This only lasts until you stop the terminal. Use Option 1 for permanent setup.
-
-### Clearing Old Metrics (Optional)
-
-If you need to delete old metrics and start fresh:
-
-```bash
-# Delete specific metrics (e.g., blockstore)
-curl -X POST -g 'http://localhost:9090/api/v1/admin/tsdb/delete_series?match[]={__name__=~"blockstore_.*"}'
-
-# Clean up deleted data
-curl -X POST http://localhost:9090/api/v1/admin/tsdb/clean_tombstones
-
-# Wait a moment for cleanup
-sleep 2
-```
-
-**Note:** Admin API must be enabled (add `--web.enable-admin-api` to prometheus.args).
-
-### Verify Setup
-
-Once Remote Write is enabled, test watch mode:
-
-```bash
-# Create a test CSV
-cat > /tmp/test.csv << EOF
-status,count,method
-200,100,GET
-404,50,POST
-EOF
-
-# Watch the file
-./epimetheus -mode=watch \
- -file=/tmp/test.csv \
- -metric-name=test \
- -prometheus=http://localhost:9090/api/v1/write
-```
+## Project Structure
-You should see:
-```
-✅ Successfully pushed X samples to Prometheus
```
-
-Query in Prometheus (http://localhost:9090):
-```promql
-{__name__=~"test_.*"}
+epimetheus/
+├── cmd/epimetheus/ # Main entry point
+├── internal/ # config, ingester, metrics, parser, resolver, watcher
+├── docs/ # Documentation
+├── scripts/ # Helper shell scripts (verify-clickhouse, generate-test-data, etc.)
+├── test-data/ # Test CSVs
+├── Magefile.go # Build and run targets
+└── README.md
```
-## Additional Resources
-
-- [Prometheus Documentation](https://prometheus.io/docs/)
-- [Pushgateway Documentation](https://github.com/prometheus/pushgateway)
-- [Prometheus Remote Write Spec](https://prometheus.io/docs/concepts/remote_write_spec/)
-- [Grafana Documentation](https://grafana.com/docs/)
-
## Version
Current version: 0.0.0