summaryrefslogtreecommitdiff
path: root/docs/DOCS-RESTRUCTURE-PLAN.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/DOCS-RESTRUCTURE-PLAN.md')
-rw-r--r--docs/DOCS-RESTRUCTURE-PLAN.md235
1 files changed, 235 insertions, 0 deletions
diff --git a/docs/DOCS-RESTRUCTURE-PLAN.md b/docs/DOCS-RESTRUCTURE-PLAN.md
new file mode 100644
index 0000000..c688993
--- /dev/null
+++ b/docs/DOCS-RESTRUCTURE-PLAN.md
@@ -0,0 +1,235 @@
+# Documentation Restructure Plan
+
+This plan addresses the current documentation sprawl and clarifies the **multiple ingestion backends** (Prometheus, ClickHouse, and future backends such as VictoriaMetrics) and **modes** (realtime, historic, backfill, auto, watch).
+
+---
+
+## 1. Current State Summary
+
+### 1.1 Existing Markdown Files
+
+| File | Purpose | Issues |
+|------------|-----------------------------------|--------|
+| `README.md` | Single ~995-line doc: intro, modes, backends, setup, troubleshooting, macOS, cleanup | Too long; mixes audiences and backends; hard to maintain |
+| `AGENT.md` | Agent rules (Grafana dashboard guidelines + ref to `~/git/conf/snippets/go/go-projects.md`) | Fine as-is; not user docs |
+| `CLAUDE.md` | One-line pointer to AGENT.md | Fine as-is |
+
+### 1.2 Broken or Missing References in README
+
+- `CSV-FORMAT-FLEXIBILITY.md` – linked, **does not exist**
+- `DNS-RESOLUTION-FEATURE.md` – linked, **does not exist**
+- `DTAIL-METRICS-EXAMPLE.md` – linked, **does not exist**
+- `MAGEFILE.md` – linked, **does not exist** (build logic lives in `Magefile.go`)
+
+### 1.3 Ingestion Backends (from codebase)
+
+| Backend | Modes | Notes |
+|-----------|---------------------------|--------|
+| **Prometheus** | realtime (Pushgateway), historic/backfill/auto (Remote Write), watch (Remote Write) | Primary; Remote Write requires feature flag |
+| **ClickHouse** | watch only | Optional; can run with Prometheus or alone |
+
+*VictoriaDB / VictoriaMetrics:* Not present in code today. Plan leaves room for a dedicated backend doc when added.
+
+---
+
+## 2. Goals
+
+1. **Separate by ingestion backend** so Prometheus vs ClickHouse (and future backends) have clear, non-redundant docs.
+2. **Split by audience and topic**: quick start vs reference vs operations (setup, troubleshooting, cleanup).
+3. **Fix broken links**: either add the missing docs or replace links with in-README sections / new doc paths.
+4. **Single source of truth** for each concept (e.g. “how watch mode works” and “how to configure Prometheus” in one place each).
+5. **Easier maintenance**: smaller, focused files; clear naming; one `docs/` tree.
+
+---
+
+## 3. Proposed Directory Layout
+
+```
+epimetheus/
+├── README.md # Short overview + quick start + doc index (slimmed)
+├── AGENT.md # Unchanged
+├── CLAUDE.md # Unchanged
+├── docs/
+│ ├── README.md # Documentation index (nav + short descriptions)
+│ │
+│ ├── guides/ # How-to and concepts
+│ │ ├── quickstart.md # Minimal path to first push (Prometheus or ClickHouse)
+│ │ ├── modes.md # All modes: realtime, historic, backfill, auto, watch
+│ │ ├── data-formats.md # CSV (epimetheus + tabular) and JSON
+│ │ ├── csv-format-flexibility.md # “Any CSV” + examples (replaces missing file)
+│ │ ├── dns-resolution.md # IP → hostname resolution (replaces missing file)
+│ │ └── dtail-metrics-example.md # Optional: dtail.csv walkthrough (replaces missing file)
+│ │
+│ ├── backends/ # One doc per ingestion backend
+│ │ ├── prometheus.md # Pushgateway + Remote Write, config, limits
+│ │ ├── clickhouse.md # Watch-only; schema; verify script
+│ │ └── (future) victoriametrics.md # When/if added
+│ │
+│ ├── operations/ # Setup, runbooks, platform-specific
+│ │ ├── setup-prometheus.md # Remote Write receiver, scrape config, retention
+│ │ ├── setup-clickhouse.md # Table creation, verify-clickhouse.sh
+│ │ ├── troubleshooting.md # Connection issues, “no metrics”, out-of-order, etc.
+│ │ ├── cleanup.md # Benchmark cleanup, Pushgateway delete, port-forwards
+│ │ ├── macos-setup.md # Brew, Prometheus args, Remote Write on macOS
+│ │ └── kubernetes.md # Port-forwards, Helm, ConfigMaps (from current README)
+│ │
+│ ├── reference/ # Reference material
+│ │ ├── cli.md # All flags by mode
+│ │ ├── test-metrics.md # epimetheus_test_* metrics and types
+│ │ ├── grafana-dashboard.md # Panels, deploy options, datasource
+│ │ ├── example-queries.md # PromQL and curl examples
+│ │ └── magefile.md # Mage targets (replaces missing MAGEFILE.md)
+│ │
+│ └── design/ # Optional, for contributors
+│ └── architecture.md # High-level data flow (current ASCII diagrams)
+```
+
+---
+
+## 4. File-by-File Plan
+
+### 4.1 Root `README.md` (slimmed)
+
+- **Keep:** Project name, tagline, “Why Epimetheus”, **one** high-level architecture diagram (simplified).
+- **Keep:** Very short “Overview” (1 paragraph) and **Quick Start** (3–5 steps pointing at `docs/guides/quickstart.md` for details).
+- **Add:** **Documentation index** – bullet list with links to:
+ - `docs/README.md`
+ - `docs/guides/quickstart.md`, `docs/guides/modes.md`
+ - `docs/backends/prometheus.md`, `docs/backends/clickhouse.md`
+ - `docs/operations/setup-prometheus.md`, `docs/operations/troubleshooting.md`
+ - `docs/reference/cli.md`, `docs/reference/magefile.md`
+- **Move out of README into `docs/`:**
+ - All mode details → `docs/guides/modes.md`
+ - Backend-specific behaviour → `docs/backends/*.md`
+ - Setup (Prometheus, ClickHouse, k8s, macOS) → `docs/operations/*.md`
+ - Data formats → `docs/guides/data-formats.md` (+ csv-format-flexibility, dns-resolution, dtail example)
+ - Test metrics, Grafana, example queries → `docs/reference/*.md`
+ - Troubleshooting, cleanup → `docs/operations/*.md`
+ - Time range / retention → `docs/backends/prometheus.md` and `docs/operations/setup-prometheus.md`
+- **Fix links:** Remove links to `CSV-FORMAT-FLEXIBILITY.md`, `DNS-RESOLUTION-FEATURE.md`, `DTAIL-METRICS-EXAMPLE.md`, `MAGEFILE.md` from README; point to `docs/guides/...` and `docs/reference/magefile.md` instead.
+
+**Target:** README under ~150–200 lines.
+
+---
+
+### 4.2 `docs/README.md` (new)
+
+- Title: “Epimetheus Documentation”.
+- Short intro (2–3 sentences).
+- **Structured index** with sections:
+ - **Guides:** quickstart, modes, data formats, CSV flexibility, DNS resolution, dtail example.
+ - **Ingestion backends:** Prometheus, ClickHouse (and placeholder for Victoria* if desired).
+ - **Operations:** setup (Prometheus, ClickHouse), troubleshooting, cleanup, macOS, Kubernetes.
+ - **Reference:** CLI, test metrics, Grafana, example queries, Mage.
+- Each entry: link + one-line description.
+
+---
+
+### 4.3 Guides
+
+| Doc | Content | Source |
+|-----|--------|--------|
+| `guides/quickstart.md` | Minimal steps: build/run, push to Prometheus or ClickHouse, view (Prometheus UI or verify-clickhouse.sh). | Current README “Quick Start” + “Run in Realtime Mode” + one watch example. |
+| `guides/modes.md` | Table: mode name, purpose, which backends, main flags. Then one subsection per mode (realtime, historic, backfill, auto, watch) with short description and example command. | Current README “Operating Modes”. |
+| `guides/data-formats.md` | Epimetheus CSV (metric_name, labels, value, timestamp_ms), JSON format, optional timestamp. Link to csv-format-flexibility for tabular CSV. | Current README “Data Formats”. |
+| `guides/csv-format-flexibility.md` | “Works with any CSV”: headers → metric names/labels, numeric vs string columns, sanitization, examples (web, food). | New content; replaces missing `CSV-FORMAT-FLEXIBILITY.md`. |
+| `guides/dns-resolution.md` | Default `ip` resolution; `-resolve-ip-labels`; behaviour on failure. | New content; replaces missing `DNS-RESOLUTION-FEATURE.md`. |
+| `guides/dtail-metrics-example.md` | Optional: step-by-step dtail.csv example. | New content; replaces missing `DTAIL-METRICS-EXAMPLE.md`; can be short. |
+
+---
+
+### 4.4 Backends
+
+| Doc | Content | Source |
+|-----|--------|--------|
+| `backends/prometheus.md` | Pushgateway (realtime) vs Remote Write (historic/watch); URLs; time range and retention limits; out-of-order; link to setup-prometheus. | README Prometheus bits + “Time Range Limitations” + “Setup Requirements” (Remote Write). |
+| `backends/clickhouse.md` | Watch-only; `-clickhouse`, `-clickhouse-table`; table schema (from code/comments); `verify-clickhouse.sh`; Prometheus + ClickHouse together. | README “ClickHouse Support” + verify-clickhouse.sh + internal/ingester/clickhouse.go. |
+
+---
+
+### 4.5 Operations
+
+| Doc | Content | Source |
+|-----|--------|--------|
+| `operations/setup-prometheus.md` | Enable Remote Write receiver (and Admin API); scrape config for Pushgateway; retention; Prometheus 3.x syntax; verify commands. | Current README “Setup Requirements” (Prometheus). |
+| `operations/setup-clickhouse.md` | Ensure table exists (e.g. from ingester); run verify script; optional Docker/systemd. | From README + scripts + code. |
+| `operations/troubleshooting.md` | Pushgateway connection; metrics not in Prometheus; “Remote write receiver not enabled”; out-of-order errors; dashboard not in Grafana; ClickHouse connection. | Current README “Troubleshooting”. |
+| `operations/cleanup.md` | Cleanup benchmark data script; manual Prometheus delete/tombstones; Pushgateway delete; stop port-forwards; uninstall Pushgateway. | Current README “Cleanup”. |
+| `operations/macos-setup.md` | Brew install; prometheus.args (Remote Write, Admin API); verify; optional “temporary” run. | Current README “MacOS Setup”. |
+| `operations/kubernetes.md` | Port-forwards (Pushgateway, Prometheus, Grafana); Helm/ConfigMap for dashboard; namespace. | Extracted from README examples. |
+
+---
+
+### 4.6 Reference
+
+| Doc | Content | Source |
+|-----|--------|--------|
+| `reference/cli.md` | Table or list of all flags by mode (realtime, historic, backfill, auto, watch); default values. | From README + `cmd/epimetheus/main.go`. |
+| `reference/test-metrics.md` | Each `epimetheus_test_*` metric: type, description, labels, use case. | Current README “Test Metrics”. |
+| `reference/grafana-dashboard.md` | Panels list; deploy (ConfigMap, manual import, script); datasource; link to AGENT.md for panel guidelines. | Current README “Grafana Dashboard”. |
+| `reference/example-queries.md` | PromQL and curl examples (basic, histogram, labeled counter). | Current README “Example Queries”. |
+| `reference/magefile.md` | List of Mage targets (build, test, run, RunWatchClickHouse, cleanup, etc.) with one-line description and example. | From `Magefile.go`; replaces missing `MAGEFILE.md`. |
+
+---
+
+### 4.7 Design (optional)
+
+| Doc | Content | Source |
+|-----|--------|--------|
+| `design/architecture.md` | High-level data flow; ASCII diagrams (current README); “when to use Pushgateway vs Remote Write” and “when to use which backend”. | Current README “Architecture” and “Best Practices”. |
+
+---
+
+## 5. Implementation Order
+
+1. **Create `docs/` and index**
+ - Create `docs/README.md` with the full index (links can target paths that don’t exist yet).
+2. **Fix broken links and add missing content**
+ - Add `docs/guides/csv-format-flexibility.md`, `docs/guides/dns-resolution.md`, `docs/guides/dtail-metrics-example.md`, `docs/reference/magefile.md` so all current README links resolve.
+3. **Backend-centric docs**
+ - Add `docs/backends/prometheus.md` and `docs/backends/clickhouse.md`; move/duplicate content from README.
+4. **Operations**
+ - Add `docs/operations/setup-prometheus.md`, `setup-clickhouse.md`, `troubleshooting.md`, `cleanup.md`, `macos-setup.md`, `kubernetes.md`; move content from README.
+5. **Guides**
+ - Add `docs/guides/quickstart.md`, `modes.md`, `data-formats.md`; move content from README.
+6. **Reference**
+ - Add `docs/reference/cli.md`, `test-metrics.md`, `grafana-dashboard.md`, `example-queries.md`; move content from README.
+7. **Slim README**
+ - Cut README down to overview, quick start, and doc index; replace old links with `docs/...` links.
+8. **Optional**
+ - Add `docs/design/architecture.md` and link from `docs/README.md`.
+
+---
+
+## 6. Cross-Cutting Conventions
+
+- **Links:** Prefer relative links from repo root (e.g. `[Modes](docs/guides/modes.md)`) or from `docs/` (e.g. `[Prometheus](backends/prometheus.md)` inside docs).
+- **Backend mentions:** In mode/CLI docs, use a short table or sentence: “Supported backends: Prometheus (all modes), ClickHouse (watch only).”
+- **One diagram:** Keep one high-level diagram in README or `design/architecture.md`; avoid duplicating large ASCII art in multiple files.
+- **CLI and defaults:** Single source of truth in `reference/cli.md`; guides and backend docs can quote the relevant subset.
+- **Version/legal:** Keep “Version” and “License” in root README (or CONTRIBUTING.md if you add one).
+
+---
+
+## 7. Future: VictoriaMetrics / VictoriaDB
+
+When adding a new backend (e.g. VictoriaMetrics, which speaks Prometheus Remote Write):
+
+- Add `docs/backends/victoriametrics.md` (or `victoriadb.md`) with URL format, any extra flags, and differences from Prometheus.
+- In `docs/README.md` and root README, add one line to the “Ingestion backends” section.
+- In `docs/guides/modes.md` and `reference/cli.md`, extend the “which backends support which mode” table and flags.
+- No need to duplicate full setup/troubleshooting if it matches Prometheus; link to `backends/prometheus.md` and note compatibility where relevant.
+
+---
+
+## 8. Checklist Before Calling Done
+
+- [ ] All current README links resolve (no 404s).
+- [ ] README is under ~200 lines and ends with doc index.
+- [ ] `docs/README.md` lists every new doc with link and one-line description.
+- [ ] Prometheus vs ClickHouse (and modes) are clearly separated in backends and guides.
+- [ ] Setup, troubleshooting, and cleanup live under `docs/operations/`.
+- [ ] Mage is documented in `docs/reference/magefile.md` and linked from root README.
+- [ ] Optional: `docs/design/architecture.md` exists and is linked from index.
+
+This plan gives you a single place to extend when you add VictoriaDB/VictoriaMetrics or another backend, and keeps the root README short while all detailed docs live under `docs/` with a clear structure by topic and backend.