diff options
| author | Paul Buetow <paul@buetow.org> | 2026-03-23 23:04:29 +0200 |
|---|---|---|
| committer | Paul Buetow <paul@buetow.org> | 2026-03-23 23:04:29 +0200 |
| commit | 5c6cf70d84f288dc80cfb94af53bd6107ae15835 (patch) | |
| tree | f53fb7139c548235017132fda7b572405d9fd825 /README.md | |
| parent | 8606cc1c7113f752a6de5945e689286a8f33d494 (diff) | |
Add vLLM watch dashboard, side-by-side layout, and insert-mode default
- hyperstack.rb: add VllmWatcher class and `watch` subcommand — live
terminal dashboard polling all active VMs every 5 s via SSH; shows
GPU util/VRAM/temp/power bars and vLLM throughput/requests/KV-cache/
cache-hit bars aligned in a shared column layout
- draw(): render two or more VM panels side-by-side (horizontal) with a
│ separator, padded to equal visible width; single VM falls back to
vertical layout
- pi/agent/extensions/modal-editor: start in INSERT mode instead of NORMAL
- README: document watch command and update fish script rename
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Diffstat (limited to 'README.md')
| -rw-r--r-- | README.md | 23 |
1 files changed, 21 insertions, 2 deletions
@@ -1,4 +1,4 @@ -# hyperstack +# hypr <img src="logo.svg" alt="Hyperstack · Pi · FreeBSD · AI · tmux logo" width="600"/> @@ -238,7 +238,7 @@ Custom extensions live in `pi/agent/extensions/` and are loaded automatically vi | `fresh-subagent` | Spawns a sub-agent in a clean context for isolated tasks | | `reload-runtime` | `/reload-runtime` command — hot-reloads extensions without restarting Pi | | `nemotron-tool-repair` | Repairs malformed tool calls from Nemotron models | -| `taskwarrior-plan-mode` | Integrates Taskwarrior task management into Pi sessions | +| `agent-plan-mode` | Integrates task management into Pi sessions | ### Web search @@ -316,6 +316,7 @@ Commands: delete Destroy the tracked VM delete-both Destroy both VM1 and VM2 status Show VM and WireGuard status + watch Live dashboard: vLLM + GPU stats for all active VMs (refreshes every 5 s) test Run end-to-end inference tests (vLLM) model switch <preset> Hot-switch the running vLLM model @@ -527,6 +528,24 @@ docker run -d \ ## Monitoring vLLM +The `watch` command provides a built-in terminal dashboard that polls all active VMs every 5 seconds: + +```bash +ruby hyperstack.rb watch +``` + +It shows per-VM panels with: +- **GPU** (per device): utilisation bar, temperature, power draw, VRAM % +- **Requests**: running / waiting / swapped queue depth +- **KV cache**: GPU fill % +- **Perf**: decode speed (tok/s), TTFT, e2e latency (means across all completed requests) +- **Tokens**: cumulative prefill and generation totals + +Stats are sourced from the vLLM `/metrics` Prometheus endpoint over the WireGuard tunnel +and from `nvidia-smi` over SSH. Press `Ctrl-C` to exit. + +For lower-level ad-hoc inspection: + ```bash # Live engine stats (throughput, KV cache, prefix cache hit rate) ssh ubuntu@<vm-ip> 'docker logs -f vllm_nemotron_super 2>&1 | grep "Engine 000"' |
