diff options
| author | Paul Buetow <paul@buetow.org> | 2026-04-05 23:37:17 +0300 |
|---|---|---|
| committer | Paul Buetow <paul@buetow.org> | 2026-04-05 23:37:17 +0300 |
| commit | fddb90ca65cc49a6ee343dd242965aae6b7a594a (patch) | |
| tree | 9a7d76c0d0d827e71c575890f805cee3f4ebf902 | |
| parent | 2b64fcebf8ef0425575eafa452caae2450a8c7d3 (diff) | |
extract scanned bills and receipts: split PDFs, name by date/type/address/amount
| -rw-r--r-- | gitsyncer/config.json | 4 | ||||
| -rw-r--r-- | prompts/skills/extracting-scanned-bills/SKILL.md | 78 | ||||
| -rw-r--r-- | prompts/skills/extracting-scanned-receipts/SKILL.md | 79 | ||||
| -rw-r--r-- | prompts/skills/f3s/SKILL.md | 1 | ||||
| -rw-r--r-- | prompts/skills/f3s/references/immich.md | 49 |
5 files changed, 211 insertions, 0 deletions
diff --git a/gitsyncer/config.json b/gitsyncer/config.json index a995e81..127c9a5 100644 --- a/gitsyncer/config.json +++ b/gitsyncer/config.json @@ -13,6 +13,10 @@ "backupLocation": true, "descriptionSyncHost": "root@r0", "descriptionSyncRoot": "/data/nfs/k3svolumes/git-server/repos" + }, + { + "host": "paul@t450:git", + "backupLocation": true } ], "repositories": [], diff --git a/prompts/skills/extracting-scanned-bills/SKILL.md b/prompts/skills/extracting-scanned-bills/SKILL.md new file mode 100644 index 0000000..fdf8439 --- /dev/null +++ b/prompts/skills/extracting-scanned-bills/SKILL.md @@ -0,0 +1,78 @@ +--- +name: extracting-scanned-bills +description: "Extracts individual pages from scanned multi-page PDF bills and receipts, naming each file by date, type, address, and amount. Use when asked to extract, split, or organize scanned bills, utility receipts, or payment documents from PDFs. Triggers on: extract bills, split scans, organize receipts, scanned documents." +--- + +# Extracting Scanned Bills + +Split multi-page scanned PDFs into individually named single-page files using content-based naming. + +## Prerequisites + +- `pdftk` or `qpdf` must be installed for PDF splitting. +- Use `qpdf --show-npages <file>` to get page counts. +- Use `pdftk <file> cat <page> output <dest>` to extract single pages. + +## Workflow + +### 1. Inventory + +Count pages in each PDF: + +```sh +for f in *.pdf; do echo "$f: $(qpdf --show-npages "$f") pages"; done +``` + +### 2. Analyze content + +Use `look_at` on each PDF with this objective: + +> For each page, identify: exact transaction date, what the payment is for (utility type, tax, fine, insurance, etc.), amount with currency, any client/account numbers, addresses, and person names. List every detail page by page. + +Analyze all PDFs in parallel when there are multiple files. + +### 3. Name each page + +Apply this naming convention: + +``` +YYYY-MM-DD-<type>-<address-short>-<amount><currency>.pdf +``` + +**Rules:** +- **Date**: Transaction date in `YYYY-MM-DD` format. +- **Type**: Lowercase, hyphen-separated description of what the bill is for (e.g., `electricity`, `water`, `heating`, `property-tax-and-waste`, `health-insurance`, `speeding-fine`, `parking-fine`). +- **Address**: Short form of the service address (e.g., `sofia-zapaden-park-bl100`, `vidin-himik-bl25`, `podgore-zdravkov14`). Use `and` to join when a single receipt covers multiple addresses. +- **Amount**: Numeric amount with currency suffix (`bgn`, `eur`). Omit if the page is an annex/appendix without its own total. +- **Suffixes**: Use `-card-slip`, `-payment-summary`, `-annex-p1`, `-annex-p2`, etc. for supporting pages. +- All lowercase, no spaces, hyphens as separators. + +**Examples:** +``` +2026-02-26-electricity-heating-sofia-zapaden-park-bl100-81.02eur.pdf +2025-08-21-speeding-fine-sofia-alek-konstantinov38-cb9625xp-50bgn.pdf +2025-10-09-health-insurance-egn7411271790-receipt-51.84bgn.pdf +2026-02-02-property-tax-and-waste-annex-p2-sofia-zapaden-park-bl100-and-alek-konstantinov38.pdf +2025-12-01-easypay-payment-summary-72.94bgn.pdf +``` + +### 4. Extract pages + +Create the destination directory if needed, then extract: + +```sh +mkdir -p <dest> +pdftk <source>.pdf cat <page> output <dest>/<named-file>.pdf +``` + +### 5. Report + +After extraction, list all files in the destination and report the total count plus a brief summary of what was extracted (date range, types of documents, addresses covered). + +## Notes + +- When a single page contains multiple receipts for different addresses, include all addresses in the filename joined with `and`. +- For fines, include the vehicle plate number in the filename (e.g., `cb9625xp`). +- For health insurance, include the EGN identifier. +- For property tax / waste fees that span multiple pages (receipt + annexes), keep them as separate files but use consistent naming with annex suffixes. +- Always preserve the original scanned PDFs — never modify or delete them. diff --git a/prompts/skills/extracting-scanned-receipts/SKILL.md b/prompts/skills/extracting-scanned-receipts/SKILL.md new file mode 100644 index 0000000..44047d3 --- /dev/null +++ b/prompts/skills/extracting-scanned-receipts/SKILL.md @@ -0,0 +1,79 @@ +--- +name: extracting-scanned-receipts +description: "Extracts individual pages from scanned multi-page PDF purchase receipts and invoices, naming each file by date, store, item description, and amount. Use when asked to extract, split, or organize scanned purchase receipts, shop receipts, or invoices from PDFs. Triggers on: extract receipts, split receipts, organize invoices, scanned receipts, purchase receipts." +--- + +# Extracting Scanned Receipts + +Split multi-page scanned PDFs of purchase receipts and invoices into individually named single-page files using content-based naming. + +## Prerequisites + +- `pdftk` or `qpdf` must be installed for PDF splitting. +- Use `qpdf --show-npages <file>` to get page counts. +- Use `pdftk <file> cat <page> output <dest>` to extract single pages. + +## Workflow + +### 1. Inventory + +Count pages in each PDF: + +```sh +for f in *.pdf; do echo "$f: $(qpdf --show-npages "$f") pages"; done +``` + +### 2. Analyze content + +Use `look_at` on each PDF with this objective: + +> For each page, identify: exact purchase date, what was purchased (item names, product descriptions), amount with currency, store/merchant name and location, any order/receipt/invoice numbers, and person names. List every detail page by page. + +Analyze all PDFs in parallel when there are multiple files. + +### 3. Name each page + +Apply this naming convention: + +``` +YYYY-MM-DD-receipt-<store>-<item-description>-<amount><currency>.pdf +``` + +**Rules:** +- **Date**: Purchase/transaction date in `YYYY-MM-DD` format. +- **Prefix**: Always `receipt` (or `invoice` if the document is an invoice/Rechnung). +- **Store**: Lowercase short name of the merchant (e.g., `technopolis`, `ikea`, `amazon`). +- **Item description**: Brief lowercase hyphen-separated description of what was purchased (e.g., `krups-coffee-machine`, `samsung-tablet`, `office-chair`). Keep it short but identifiable. +- **Amount**: Numeric amount with currency suffix (`bgn`, `eur`, `gbp`, `usd`). Omit if the page is a packing list or supplementary page without a total. +- **Suffixes**: Use `-warranty`, `-packing-list`, `-delivery-note`, etc. for supporting pages. +- All lowercase, no spaces, hyphens as separators. + +**Examples:** +``` +2025-01-24-receipt-technopolis-mall-serdika-krups-coffee-machine-132.40eur.pdf +2023-05-27-receipt-technopolis-24inch-lg-tv-zapaden-park.pdf +2024-09-01-receipt-ikea-mol-sofia.pdf +2022-09-08-invoice-chair-pro.pdf +2024-05-17-packing-list-macbook-pro.pdf +``` + +### 4. Extract pages + +Create the destination directory if needed, then extract: + +```sh +mkdir -p <dest> +pdftk <source>.pdf cat <page> output <dest>/<named-file>.pdf +``` + +### 5. Report + +After extraction, list all files in the destination and report the total count plus a brief summary of what was extracted (date range, stores, items). + +## Notes + +- When a single page contains multiple items from the same store, name after the most significant/expensive item or use a combined description (e.g., `lg-tv-and-philips-purifier`). +- Include store location in the name only when it adds useful context (e.g., `mall-serdika`, `mol-sofia`). +- For warranty extension documents, include `-warranty` suffix. +- For delivery/packing lists, use `packing-list` or `delivery-note` as the prefix instead of `receipt`. +- Always preserve the original scanned PDFs — never modify or delete them. diff --git a/prompts/skills/f3s/SKILL.md b/prompts/skills/f3s/SKILL.md index c067010..130523a 100644 --- a/prompts/skills/f3s/SKILL.md +++ b/prompts/skills/f3s/SKILL.md @@ -26,6 +26,7 @@ Detailed reference documentation is in the `references/` subfolder: - [k3s Setup](references/k3s-setup.md) — HA k3s cluster, etcd, node IPs, kubeconfig, ArgoCD - [Observability](references/observability.md) — Prometheus, Grafana, Loki, Alloy, Tempo - [Package Repos](references/package-repos.md) — Custom FreeBSD/OpenBSD pkg repo served from k3s nginx pod +- [Immich](references/immich.md) — Photo server deployment, job queue stats, troubleshooting ## Quick Reference: Host IPs diff --git a/prompts/skills/f3s/references/immich.md b/prompts/skills/f3s/references/immich.md new file mode 100644 index 0000000..fac27e7 --- /dev/null +++ b/prompts/skills/f3s/references/immich.md @@ -0,0 +1,49 @@ +# Immich + +Immich runs in the `services` namespace. Config is in `f3s/immich/`. + +## Components + +- `immich-server` — main API and web UI (port 2283) +- `immich-machine-learning` — ML inference for face detection, smart search, OCR (port 3003) +- `immich-postgres` — PostgreSQL 16 with pgvecto-rs extension +- `immich-valkey` — Redis-compatible queue backend (BullMQ) + +## Gathering Job Queue Stats + +Immich uses BullMQ via Valkey. To snapshot current queue counters: + +```sh +kubectl exec -n services deploy/immich-valkey -- sh -c ' +for queue in thumbnailGeneration metadataExtraction videoConversion faceDetection smartSearch duplicateDetection backgroundTask storageTemplateMigration search sidecar library notification ocr migration; do + waiting=$(valkey-cli LLEN "immich_bull:${queue}:wait" 2>/dev/null) + active=$(valkey-cli LLEN "immich_bull:${queue}:active" 2>/dev/null) + delayed=$(valkey-cli ZCARD "immich_bull:${queue}:delayed" 2>/dev/null) + completed=$(valkey-cli ZCARD "immich_bull:${queue}:completed" 2>/dev/null) + failed=$(valkey-cli ZCARD "immich_bull:${queue}:failed" 2>/dev/null) + echo "${queue}: waiting=${waiting} active=${active} delayed=${delayed} completed=${completed} failed=${failed}" +done +' +``` + +## Saving and Comparing Snapshots + +Save a snapshot to `/tmp/immich-queues-<timestamp>.txt`: + +```sh +kubectl exec -n services deploy/immich-valkey -- sh -c '...' > /tmp/immich-queues-$(date +%Y%m%d-%H%M%S).txt +``` + +To compare a previous snapshot with current state, re-run the command and diff: + +```sh +diff /tmp/immich-queues-<old>.txt /tmp/immich-queues-<new>.txt +``` + +Decreasing `waiting` and stable/zero `failed` means healthy progress. + +## Troubleshooting + +- **Postgres crash loop**: Usually caused by liveness probe killing postgres during WAL recovery. Check `kubectl describe pod` for probe failures and postgres logs for "database system was interrupted while in recovery". Fix by relaxing probe timeouts/thresholds and adding resource limits. +- **Server crash loop**: Often caused by postgres being unavailable. Fix postgres first. +- **ML errors**: "Machine learning repository not been setup" is transient — resolves once the ML pod health check passes. |
