Add vLLM watch dashboard, side-by-side layout, and insert-mode default

- hyperstack.rb: add VllmWatcher class and `watch` subcommand — live terminal dashboard polling all active VMs every 5 s via SSH; shows GPU util/VRAM/temp/power bars and vLLM throughput/requests/KV-cache/ cache-hit bars aligned in a shared column layout - draw(): render two or more VM panels side-by-side (horizontal) with a │ separator, padded to equal visible width; single VM falls back to vertical layout - pi/agent/extensions/modal-editor: start in INSERT mode instead of NORMAL - README: document watch command and update fish script rename Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
author: Paul Buetow <paul@buetow.org> 2026-03-23 23:04:29 +0200
committer: Paul Buetow <paul@buetow.org> 2026-03-23 23:04:29 +0200
commit: 5c6cf70d84f288dc80cfb94af53bd6107ae15835 (patch)
tree: f53fb7139c548235017132fda7b572405d9fd825 /README.md
parent: 8606cc1c7113f752a6de5945e689286a8f33d494 (diff)
1 files changed, 21 insertions, 2 deletions
diff --git a/README.md b/README.md
index 13951a1..72ba208 100644
--- a/README.md
+++ b/README.md
@@ -1,4 +1,4 @@
-# hyperstack
+# hypr
 
 <img src="logo.svg" alt="Hyperstack · Pi · FreeBSD · AI · tmux logo" width="600"/>
 
@@ -238,7 +238,7 @@ Custom extensions live in `pi/agent/extensions/` and are loaded automatically vi
 | `fresh-subagent` | Spawns a sub-agent in a clean context for isolated tasks |
 | `reload-runtime` | `/reload-runtime` command — hot-reloads extensions without restarting Pi |
 | `nemotron-tool-repair` | Repairs malformed tool calls from Nemotron models |
-| `taskwarrior-plan-mode` | Integrates Taskwarrior task management into Pi sessions |
+| `agent-plan-mode` | Integrates task management into Pi sessions |
 
 ### Web search
 
@@ -316,6 +316,7 @@ Commands:
   delete       Destroy the tracked VM
   delete-both  Destroy both VM1 and VM2
   status       Show VM and WireGuard status
+  watch        Live dashboard: vLLM + GPU stats for all active VMs (refreshes every 5 s)
   test         Run end-to-end inference tests (vLLM)
   model switch <preset>  Hot-switch the running vLLM model
 
@@ -527,6 +528,24 @@ docker run -d \
 
 ## Monitoring vLLM
 
+The `watch` command provides a built-in terminal dashboard that polls all active VMs every 5 seconds:
+
+```bash
+ruby hyperstack.rb watch
+```
+
+It shows per-VM panels with:
+- **GPU** (per device): utilisation bar, temperature, power draw, VRAM %
+- **Requests**: running / waiting / swapped queue depth
+- **KV cache**: GPU fill %
+- **Perf**: decode speed (tok/s), TTFT, e2e latency (means across all completed requests)
+- **Tokens**: cumulative prefill and generation totals
+
+Stats are sourced from the vLLM `/metrics` Prometheus endpoint over the WireGuard tunnel
+and from `nvidia-smi` over SSH.  Press `Ctrl-C` to exit.
+
+For lower-level ad-hoc inspection:
+
 ```bash
 # Live engine stats (throughput, KV cache, prefix cache hit rate)
 ssh ubuntu@<vm-ip> 'docker logs -f vllm_nemotron_super 2>&1 | grep "Engine 000"'
author	Paul Buetow <paul@buetow.org>	2026-03-23 23:04:29 +0200
committer	Paul Buetow <paul@buetow.org>	2026-03-23 23:04:29 +0200
commit	5c6cf70d84f288dc80cfb94af53bd6107ae15835 (patch)
tree	f53fb7139c548235017132fda7b572405d9fd825 /README.md
parent	8606cc1c7113f752a6de5945e689286a8f33d494 (diff)