diff options
Diffstat (limited to 'docs/DOCS-RESTRUCTURE-PLAN.md')
| -rw-r--r-- | docs/DOCS-RESTRUCTURE-PLAN.md | 235 |
1 files changed, 235 insertions, 0 deletions
diff --git a/docs/DOCS-RESTRUCTURE-PLAN.md b/docs/DOCS-RESTRUCTURE-PLAN.md new file mode 100644 index 0000000..c688993 --- /dev/null +++ b/docs/DOCS-RESTRUCTURE-PLAN.md @@ -0,0 +1,235 @@ +# Documentation Restructure Plan + +This plan addresses the current documentation sprawl and clarifies the **multiple ingestion backends** (Prometheus, ClickHouse, and future backends such as VictoriaMetrics) and **modes** (realtime, historic, backfill, auto, watch). + +--- + +## 1. Current State Summary + +### 1.1 Existing Markdown Files + +| File | Purpose | Issues | +|------------|-----------------------------------|--------| +| `README.md` | Single ~995-line doc: intro, modes, backends, setup, troubleshooting, macOS, cleanup | Too long; mixes audiences and backends; hard to maintain | +| `AGENT.md` | Agent rules (Grafana dashboard guidelines + ref to `~/git/conf/snippets/go/go-projects.md`) | Fine as-is; not user docs | +| `CLAUDE.md` | One-line pointer to AGENT.md | Fine as-is | + +### 1.2 Broken or Missing References in README + +- `CSV-FORMAT-FLEXIBILITY.md` – linked, **does not exist** +- `DNS-RESOLUTION-FEATURE.md` – linked, **does not exist** +- `DTAIL-METRICS-EXAMPLE.md` – linked, **does not exist** +- `MAGEFILE.md` – linked, **does not exist** (build logic lives in `Magefile.go`) + +### 1.3 Ingestion Backends (from codebase) + +| Backend | Modes | Notes | +|-----------|---------------------------|--------| +| **Prometheus** | realtime (Pushgateway), historic/backfill/auto (Remote Write), watch (Remote Write) | Primary; Remote Write requires feature flag | +| **ClickHouse** | watch only | Optional; can run with Prometheus or alone | + +*VictoriaDB / VictoriaMetrics:* Not present in code today. Plan leaves room for a dedicated backend doc when added. + +--- + +## 2. Goals + +1. **Separate by ingestion backend** so Prometheus vs ClickHouse (and future backends) have clear, non-redundant docs. +2. **Split by audience and topic**: quick start vs reference vs operations (setup, troubleshooting, cleanup). +3. **Fix broken links**: either add the missing docs or replace links with in-README sections / new doc paths. +4. **Single source of truth** for each concept (e.g. “how watch mode works” and “how to configure Prometheus” in one place each). +5. **Easier maintenance**: smaller, focused files; clear naming; one `docs/` tree. + +--- + +## 3. Proposed Directory Layout + +``` +epimetheus/ +├── README.md # Short overview + quick start + doc index (slimmed) +├── AGENT.md # Unchanged +├── CLAUDE.md # Unchanged +├── docs/ +│ ├── README.md # Documentation index (nav + short descriptions) +│ │ +│ ├── guides/ # How-to and concepts +│ │ ├── quickstart.md # Minimal path to first push (Prometheus or ClickHouse) +│ │ ├── modes.md # All modes: realtime, historic, backfill, auto, watch +│ │ ├── data-formats.md # CSV (epimetheus + tabular) and JSON +│ │ ├── csv-format-flexibility.md # “Any CSV” + examples (replaces missing file) +│ │ ├── dns-resolution.md # IP → hostname resolution (replaces missing file) +│ │ └── dtail-metrics-example.md # Optional: dtail.csv walkthrough (replaces missing file) +│ │ +│ ├── backends/ # One doc per ingestion backend +│ │ ├── prometheus.md # Pushgateway + Remote Write, config, limits +│ │ ├── clickhouse.md # Watch-only; schema; verify script +│ │ └── (future) victoriametrics.md # When/if added +│ │ +│ ├── operations/ # Setup, runbooks, platform-specific +│ │ ├── setup-prometheus.md # Remote Write receiver, scrape config, retention +│ │ ├── setup-clickhouse.md # Table creation, verify-clickhouse.sh +│ │ ├── troubleshooting.md # Connection issues, “no metrics”, out-of-order, etc. +│ │ ├── cleanup.md # Benchmark cleanup, Pushgateway delete, port-forwards +│ │ ├── macos-setup.md # Brew, Prometheus args, Remote Write on macOS +│ │ └── kubernetes.md # Port-forwards, Helm, ConfigMaps (from current README) +│ │ +│ ├── reference/ # Reference material +│ │ ├── cli.md # All flags by mode +│ │ ├── test-metrics.md # epimetheus_test_* metrics and types +│ │ ├── grafana-dashboard.md # Panels, deploy options, datasource +│ │ ├── example-queries.md # PromQL and curl examples +│ │ └── magefile.md # Mage targets (replaces missing MAGEFILE.md) +│ │ +│ └── design/ # Optional, for contributors +│ └── architecture.md # High-level data flow (current ASCII diagrams) +``` + +--- + +## 4. File-by-File Plan + +### 4.1 Root `README.md` (slimmed) + +- **Keep:** Project name, tagline, “Why Epimetheus”, **one** high-level architecture diagram (simplified). +- **Keep:** Very short “Overview” (1 paragraph) and **Quick Start** (3–5 steps pointing at `docs/guides/quickstart.md` for details). +- **Add:** **Documentation index** – bullet list with links to: + - `docs/README.md` + - `docs/guides/quickstart.md`, `docs/guides/modes.md` + - `docs/backends/prometheus.md`, `docs/backends/clickhouse.md` + - `docs/operations/setup-prometheus.md`, `docs/operations/troubleshooting.md` + - `docs/reference/cli.md`, `docs/reference/magefile.md` +- **Move out of README into `docs/`:** + - All mode details → `docs/guides/modes.md` + - Backend-specific behaviour → `docs/backends/*.md` + - Setup (Prometheus, ClickHouse, k8s, macOS) → `docs/operations/*.md` + - Data formats → `docs/guides/data-formats.md` (+ csv-format-flexibility, dns-resolution, dtail example) + - Test metrics, Grafana, example queries → `docs/reference/*.md` + - Troubleshooting, cleanup → `docs/operations/*.md` + - Time range / retention → `docs/backends/prometheus.md` and `docs/operations/setup-prometheus.md` +- **Fix links:** Remove links to `CSV-FORMAT-FLEXIBILITY.md`, `DNS-RESOLUTION-FEATURE.md`, `DTAIL-METRICS-EXAMPLE.md`, `MAGEFILE.md` from README; point to `docs/guides/...` and `docs/reference/magefile.md` instead. + +**Target:** README under ~150–200 lines. + +--- + +### 4.2 `docs/README.md` (new) + +- Title: “Epimetheus Documentation”. +- Short intro (2–3 sentences). +- **Structured index** with sections: + - **Guides:** quickstart, modes, data formats, CSV flexibility, DNS resolution, dtail example. + - **Ingestion backends:** Prometheus, ClickHouse (and placeholder for Victoria* if desired). + - **Operations:** setup (Prometheus, ClickHouse), troubleshooting, cleanup, macOS, Kubernetes. + - **Reference:** CLI, test metrics, Grafana, example queries, Mage. +- Each entry: link + one-line description. + +--- + +### 4.3 Guides + +| Doc | Content | Source | +|-----|--------|--------| +| `guides/quickstart.md` | Minimal steps: build/run, push to Prometheus or ClickHouse, view (Prometheus UI or verify-clickhouse.sh). | Current README “Quick Start” + “Run in Realtime Mode” + one watch example. | +| `guides/modes.md` | Table: mode name, purpose, which backends, main flags. Then one subsection per mode (realtime, historic, backfill, auto, watch) with short description and example command. | Current README “Operating Modes”. | +| `guides/data-formats.md` | Epimetheus CSV (metric_name, labels, value, timestamp_ms), JSON format, optional timestamp. Link to csv-format-flexibility for tabular CSV. | Current README “Data Formats”. | +| `guides/csv-format-flexibility.md` | “Works with any CSV”: headers → metric names/labels, numeric vs string columns, sanitization, examples (web, food). | New content; replaces missing `CSV-FORMAT-FLEXIBILITY.md`. | +| `guides/dns-resolution.md` | Default `ip` resolution; `-resolve-ip-labels`; behaviour on failure. | New content; replaces missing `DNS-RESOLUTION-FEATURE.md`. | +| `guides/dtail-metrics-example.md` | Optional: step-by-step dtail.csv example. | New content; replaces missing `DTAIL-METRICS-EXAMPLE.md`; can be short. | + +--- + +### 4.4 Backends + +| Doc | Content | Source | +|-----|--------|--------| +| `backends/prometheus.md` | Pushgateway (realtime) vs Remote Write (historic/watch); URLs; time range and retention limits; out-of-order; link to setup-prometheus. | README Prometheus bits + “Time Range Limitations” + “Setup Requirements” (Remote Write). | +| `backends/clickhouse.md` | Watch-only; `-clickhouse`, `-clickhouse-table`; table schema (from code/comments); `verify-clickhouse.sh`; Prometheus + ClickHouse together. | README “ClickHouse Support” + verify-clickhouse.sh + internal/ingester/clickhouse.go. | + +--- + +### 4.5 Operations + +| Doc | Content | Source | +|-----|--------|--------| +| `operations/setup-prometheus.md` | Enable Remote Write receiver (and Admin API); scrape config for Pushgateway; retention; Prometheus 3.x syntax; verify commands. | Current README “Setup Requirements” (Prometheus). | +| `operations/setup-clickhouse.md` | Ensure table exists (e.g. from ingester); run verify script; optional Docker/systemd. | From README + scripts + code. | +| `operations/troubleshooting.md` | Pushgateway connection; metrics not in Prometheus; “Remote write receiver not enabled”; out-of-order errors; dashboard not in Grafana; ClickHouse connection. | Current README “Troubleshooting”. | +| `operations/cleanup.md` | Cleanup benchmark data script; manual Prometheus delete/tombstones; Pushgateway delete; stop port-forwards; uninstall Pushgateway. | Current README “Cleanup”. | +| `operations/macos-setup.md` | Brew install; prometheus.args (Remote Write, Admin API); verify; optional “temporary” run. | Current README “MacOS Setup”. | +| `operations/kubernetes.md` | Port-forwards (Pushgateway, Prometheus, Grafana); Helm/ConfigMap for dashboard; namespace. | Extracted from README examples. | + +--- + +### 4.6 Reference + +| Doc | Content | Source | +|-----|--------|--------| +| `reference/cli.md` | Table or list of all flags by mode (realtime, historic, backfill, auto, watch); default values. | From README + `cmd/epimetheus/main.go`. | +| `reference/test-metrics.md` | Each `epimetheus_test_*` metric: type, description, labels, use case. | Current README “Test Metrics”. | +| `reference/grafana-dashboard.md` | Panels list; deploy (ConfigMap, manual import, script); datasource; link to AGENT.md for panel guidelines. | Current README “Grafana Dashboard”. | +| `reference/example-queries.md` | PromQL and curl examples (basic, histogram, labeled counter). | Current README “Example Queries”. | +| `reference/magefile.md` | List of Mage targets (build, test, run, RunWatchClickHouse, cleanup, etc.) with one-line description and example. | From `Magefile.go`; replaces missing `MAGEFILE.md`. | + +--- + +### 4.7 Design (optional) + +| Doc | Content | Source | +|-----|--------|--------| +| `design/architecture.md` | High-level data flow; ASCII diagrams (current README); “when to use Pushgateway vs Remote Write” and “when to use which backend”. | Current README “Architecture” and “Best Practices”. | + +--- + +## 5. Implementation Order + +1. **Create `docs/` and index** + - Create `docs/README.md` with the full index (links can target paths that don’t exist yet). +2. **Fix broken links and add missing content** + - Add `docs/guides/csv-format-flexibility.md`, `docs/guides/dns-resolution.md`, `docs/guides/dtail-metrics-example.md`, `docs/reference/magefile.md` so all current README links resolve. +3. **Backend-centric docs** + - Add `docs/backends/prometheus.md` and `docs/backends/clickhouse.md`; move/duplicate content from README. +4. **Operations** + - Add `docs/operations/setup-prometheus.md`, `setup-clickhouse.md`, `troubleshooting.md`, `cleanup.md`, `macos-setup.md`, `kubernetes.md`; move content from README. +5. **Guides** + - Add `docs/guides/quickstart.md`, `modes.md`, `data-formats.md`; move content from README. +6. **Reference** + - Add `docs/reference/cli.md`, `test-metrics.md`, `grafana-dashboard.md`, `example-queries.md`; move content from README. +7. **Slim README** + - Cut README down to overview, quick start, and doc index; replace old links with `docs/...` links. +8. **Optional** + - Add `docs/design/architecture.md` and link from `docs/README.md`. + +--- + +## 6. Cross-Cutting Conventions + +- **Links:** Prefer relative links from repo root (e.g. `[Modes](docs/guides/modes.md)`) or from `docs/` (e.g. `[Prometheus](backends/prometheus.md)` inside docs). +- **Backend mentions:** In mode/CLI docs, use a short table or sentence: “Supported backends: Prometheus (all modes), ClickHouse (watch only).” +- **One diagram:** Keep one high-level diagram in README or `design/architecture.md`; avoid duplicating large ASCII art in multiple files. +- **CLI and defaults:** Single source of truth in `reference/cli.md`; guides and backend docs can quote the relevant subset. +- **Version/legal:** Keep “Version” and “License” in root README (or CONTRIBUTING.md if you add one). + +--- + +## 7. Future: VictoriaMetrics / VictoriaDB + +When adding a new backend (e.g. VictoriaMetrics, which speaks Prometheus Remote Write): + +- Add `docs/backends/victoriametrics.md` (or `victoriadb.md`) with URL format, any extra flags, and differences from Prometheus. +- In `docs/README.md` and root README, add one line to the “Ingestion backends” section. +- In `docs/guides/modes.md` and `reference/cli.md`, extend the “which backends support which mode” table and flags. +- No need to duplicate full setup/troubleshooting if it matches Prometheus; link to `backends/prometheus.md` and note compatibility where relevant. + +--- + +## 8. Checklist Before Calling Done + +- [ ] All current README links resolve (no 404s). +- [ ] README is under ~200 lines and ends with doc index. +- [ ] `docs/README.md` lists every new doc with link and one-line description. +- [ ] Prometheus vs ClickHouse (and modes) are clearly separated in backends and guides. +- [ ] Setup, troubleshooting, and cleanup live under `docs/operations/`. +- [ ] Mage is documented in `docs/reference/magefile.md` and linked from root README. +- [ ] Optional: `docs/design/architecture.md` exists and is linked from index. + +This plan gives you a single place to extend when you add VictoriaDB/VictoriaMetrics or another backend, and keeps the root README short while all detailed docs live under `docs/` with a clear structure by topic and backend. |
