summaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
authorPaul Buetow <paul@buetow.org>2026-03-23 23:04:29 +0200
committerPaul Buetow <paul@buetow.org>2026-03-23 23:04:29 +0200
commit5c6cf70d84f288dc80cfb94af53bd6107ae15835 (patch)
treef53fb7139c548235017132fda7b572405d9fd825 /README.md
parent8606cc1c7113f752a6de5945e689286a8f33d494 (diff)
Add vLLM watch dashboard, side-by-side layout, and insert-mode default
- hyperstack.rb: add VllmWatcher class and `watch` subcommand — live terminal dashboard polling all active VMs every 5 s via SSH; shows GPU util/VRAM/temp/power bars and vLLM throughput/requests/KV-cache/ cache-hit bars aligned in a shared column layout - draw(): render two or more VM panels side-by-side (horizontal) with a │ separator, padded to equal visible width; single VM falls back to vertical layout - pi/agent/extensions/modal-editor: start in INSERT mode instead of NORMAL - README: document watch command and update fish script rename Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Diffstat (limited to 'README.md')
-rw-r--r--README.md23
1 files changed, 21 insertions, 2 deletions
diff --git a/README.md b/README.md
index 13951a1..72ba208 100644
--- a/README.md
+++ b/README.md
@@ -1,4 +1,4 @@
-# hyperstack
+# hypr
<img src="logo.svg" alt="Hyperstack · Pi · FreeBSD · AI · tmux logo" width="600"/>
@@ -238,7 +238,7 @@ Custom extensions live in `pi/agent/extensions/` and are loaded automatically vi
| `fresh-subagent` | Spawns a sub-agent in a clean context for isolated tasks |
| `reload-runtime` | `/reload-runtime` command — hot-reloads extensions without restarting Pi |
| `nemotron-tool-repair` | Repairs malformed tool calls from Nemotron models |
-| `taskwarrior-plan-mode` | Integrates Taskwarrior task management into Pi sessions |
+| `agent-plan-mode` | Integrates task management into Pi sessions |
### Web search
@@ -316,6 +316,7 @@ Commands:
delete Destroy the tracked VM
delete-both Destroy both VM1 and VM2
status Show VM and WireGuard status
+ watch Live dashboard: vLLM + GPU stats for all active VMs (refreshes every 5 s)
test Run end-to-end inference tests (vLLM)
model switch <preset> Hot-switch the running vLLM model
@@ -527,6 +528,24 @@ docker run -d \
## Monitoring vLLM
+The `watch` command provides a built-in terminal dashboard that polls all active VMs every 5 seconds:
+
+```bash
+ruby hyperstack.rb watch
+```
+
+It shows per-VM panels with:
+- **GPU** (per device): utilisation bar, temperature, power draw, VRAM %
+- **Requests**: running / waiting / swapped queue depth
+- **KV cache**: GPU fill %
+- **Perf**: decode speed (tok/s), TTFT, e2e latency (means across all completed requests)
+- **Tokens**: cumulative prefill and generation totals
+
+Stats are sourced from the vLLM `/metrics` Prometheus endpoint over the WireGuard tunnel
+and from `nvidia-smi` over SSH. Press `Ctrl-C` to exit.
+
+For lower-level ad-hoc inspection:
+
```bash
# Live engine stats (throughput, KV cache, prefix cache hit rate)
ssh ubuntu@<vm-ip> 'docker logs -f vllm_nemotron_super 2>&1 | grep "Engine 000"'