Improve architecture diagram, add WireGuard section, fix Docker setup docs

- Rewrite architecture diagram to clearly show earth as the WireGuard hub with both VMs as peers, including IPs, port, and subnet - Add WireGuard topology notes below the diagram (IPs, port, firewall) - Add 'WireGuard setup' section: tunnel design (wg1.conf structure), manual setup commands, tunnel verification, and restart/recovery - Expand Docker flag table to cover --gpus, --ipc, --network, --restart, -v, and --host (the flags the previous table omitted) - Fix client config example: show both VM1 (.1) and VM2 (.3) IPs instead of only .1 (which doesn't match the Qwen3-Coder docker run example) - Rename 'Switching models manually' -> 'Replacing the running container' to distinguish it clearly from the 'Switching models' CLI section
author: Paul Buetow <paul@buetow.org> 2026-03-21 12:51:34 +0200
committer: Paul Buetow <paul@buetow.org> 2026-03-21 12:51:34 +0200
commit: 16b23189a2bd0c237d73d3a024095fd2c7303695 (patch)
tree: f1cf0d92bec1a3c91137b931650938b5b33ec4e7 /README.md
parent: c5fda425df55f31faa49a74abdd20f9b8f8240ca (diff)
1 files changed, 100 insertions, 24 deletions
diff --git a/README.md b/README.md
index d4e764e..a7a2165 100644
--- a/README.md
+++ b/README.md
@@ -6,31 +6,38 @@ Runs two A100 VMs concurrently — each serving a different model — with [Pi](
 ## Architecture
 
 ```
-                        WireGuard tunnel (wg1, 192.168.3.0/24)
-                        earth = .2 ──────────────────────────────────────────┐
-                                │                                            │
-         ┌──────────────────────┼────────────────────────────────────────────┐│
-         │                      │                                            ││
-         ▼                      ▼                                            ▼▼
-  Hyperstack VM1 (A100 80GB)         Hyperstack VM2 (A100 80GB)
-  192.168.3.1 / hyperstack1.wg1      192.168.3.3 / hyperstack2.wg1
-  ┌──────────────────────────────┐    ┌──────────────────────────────────┐
-  │ vLLM (:11434)                │    │ vLLM (:11434)                    │
-  │   Nemotron-3-Super 120B      │    │   Qwen3-Coder-Next 80B (MoE)    │
-  │   (hybrid Mamba+MoE, AWQ-4b) │    │   (AWQ-4bit)                     │
-  └──────────────────────────────┘    └──────────────────────────────────┘
-         ▲                                     ▲
-         │ OpenAI /v1/chat/completions         │ OpenAI /v1/chat/completions
-         │                                     │
-  ┌──────┴──────┐                       ┌──────┴──────┐
-  │ Pi (local)  │                       │ Pi (local)  │
-  │ ./pi-vm1    │                       │ ./pi-vm2    │
-  │ Nemotron 3  │                       │ Qwen3 Coder │
-  └─────────────┘                       └─────────────┘
+  earth (local machine)
+  192.168.3.2 / wg1
+  ┌──────────────────────────────────────────────────────────────────┐
+  │                                                                  │
+  │  ┌─────────────┐          ┌─────────────┐                        │
+  │  │ Pi (./pi-vm1│          │ Pi (./pi-vm2│                        │
+  │  │ Nemotron 3) │          │ Qwen3 Coder)│                        │
+  │  └──────┬──────┘          └──────┬──────┘                        │
+  │         │ OpenAI API             │ OpenAI API                    │
+  │         │ /v1/chat/completions   │ /v1/chat/completions          │
+  └─────────┼────────────────────────┼──────────────────────────────-┘
+            │ WireGuard wg1          │ WireGuard wg1
+            │ 192.168.3.0/24         │ 192.168.3.0/24
+            │ UDP :56710             │ UDP :56710
+            ▼                        ▼
+  ┌──────────────────────┐  ┌──────────────────────┐
+  │ VM1 (A100 80GB)      │  │ VM2 (A100 80GB)      │
+  │ 192.168.3.1          │  │ 192.168.3.3          │
+  │ hyperstack1.wg1      │  │ hyperstack2.wg1      │
+  │                      │  │                      │
+  │ vLLM :11434          │  │ vLLM :11434          │
+  │ Nemotron-3-Super 120B│  │ Qwen3-Coder-Next 80B │
+  │ (Mamba+MoE, AWQ-4b)  │  │ (MoE, AWQ-4bit)      │
+  └──────────────────────┘  └──────────────────────┘
 ```
 
-Both VMs share a single WireGuard interface (`wg1`) on the local machine.
-Each VM runs one vLLM model exposed directly to Pi over the OpenAI-compatible API.
+**WireGuard topology:**
+- Interface `wg1` on earth carries traffic to **both** VMs simultaneously
+- earth is `192.168.3.2`; VM1 is `.1`; VM2 is `.3`; tunnel port is `56710/udp`
+- Adding VM2 to an existing wg1 tunnel: `wg1-setup.sh` adds a second `[Peer]` block without disturbing VM1
+- vLLM on each VM listens on `0.0.0.0:11434`, firewalled to `192.168.3.0/24` (WireGuard subnet only)
+- Pi connects directly to each VM's vLLM over the tunnel — no proxy or load balancer
 
 ## Prerequisites
 
@@ -43,6 +50,63 @@ Each VM runs one vLLM model exposed directly to Pi over the OpenAI-compatible AP
 - Ruby with `toml-rb` gem: `bundle install`
 - [Pi](https://pi.dev) coding agent installed
 
+## WireGuard setup
+
+`hyperstack.rb` runs `wg1-setup.sh` automatically during `create` / `create-both`.
+This section explains the tunnel design for reference and manual troubleshooting.
+
+### Tunnel design
+
+```
+earth (192.168.3.2)
+  /etc/wireguard/wg1.conf
+  [Interface]  Address = 192.168.3.2/24
+  [Peer]  # VM1 — AllowedIPs = 192.168.3.1/32, Endpoint = <vm1-public-ip>:56710
+  [Peer]  # VM2 — AllowedIPs = 192.168.3.3/32, Endpoint = <vm2-public-ip>:56710
+```
+
+A single `wg1` interface on earth carries traffic to both VMs. Each VM is a separate `[Peer]`
+block. Adding VM2 to an existing tunnel with VM1 already running leaves VM1's peer untouched.
+
+### Manual setup
+
+```bash
+# VM1 (first VM — generates fresh keys, writes /etc/wireguard/wg1.conf from scratch)
+./wg1-setup.sh <vm1-public-ip>
+
+# VM2 (additional VM — adds a [Peer] block to the existing wg1.conf)
+./wg1-setup.sh <vm2-public-ip> 192.168.3.3 hyperstack2.wg1
+```
+
+### Verify the tunnel
+
+```bash
+# Show active peers and handshake times (both VMs should appear)
+sudo wg show wg1
+
+# Ping each VM through the tunnel
+ping -c 3 192.168.3.1   # VM1
+ping -c 3 192.168.3.3   # VM2
+
+# Check vLLM is reachable over the tunnel
+curl http://hyperstack1.wg1:11434/v1/models
+curl http://hyperstack2.wg1:11434/v1/models
+```
+
+### Restart / recover
+
+```bash
+# Restart tunnel locally (e.g. after network change)
+sudo systemctl restart wg-quick@wg1
+
+# Restart tunnel on VM after a reboot (ssh via public IP since WireGuard is down)
+ssh ubuntu@<vm-public-ip> 'sudo systemctl start wg-quick@wg1'
+
+# Re-run setup when VM IP changes (e.g. after delete + recreate)
+./wg1-setup.sh <new-vm1-public-ip>
+./wg1-setup.sh <new-vm2-public-ip> 192.168.3.3 hyperstack2.wg1
+```
+
 ## Quickstart (two-VM setup)
 
 ```bash
@@ -233,12 +297,18 @@ Key flags:
 
 | Flag | Purpose |
 |------|---------|
+| `--gpus all` | Expose all GPUs to the container |
+| `--ipc=host` | Shared memory required by CUDA (avoids `/dev/shm` limits) |
+| `--network host` | Host networking so WireGuard port 11434 is directly reachable |
+| `--restart always` | Auto-restart the container on VM reboot |
+| `-v /ephemeral/hug:...` | Model cache on fast ephemeral NVMe |
 | `--tensor-parallel-size 1` | Single GPU (use 2/4 for multi-GPU) |
 | `--enable-auto-tool-choice` | Enable function/tool calling |
 | `--tool-call-parser qwen3_coder` | Parser for Qwen3-Coder tool format |
 | `--enable-prefix-caching` | Block-level KV cache reuse across requests |
 | `--gpu-memory-utilization 0.92` | Use 92% of VRAM; rest for OS/overhead |
 | `--max-model-len 262144` | Full 256k context window |
+| `--host 0.0.0.0` | Bind to all interfaces (WireGuard access requires this) |
 | `--port 11434` | Reuse Ollama port for firewall compatibility |
 
 ### Verify startup
@@ -267,11 +337,17 @@ sudo ufw allow from 192.168.3.0/24 to any port 11434 proto tcp comment 'vLLM via
 
 ### Client configuration
 
+Use the VM's WireGuard IP (`.1` for VM1, `.3` for VM2):
+
 ```bash
+# VM1 (hyperstack1.wg1 = 192.168.3.1)
 OPENAI_BASE_URL=http://192.168.3.1:11434/v1 OPENAI_API_KEY=EMPTY pi
+
+# VM2 (hyperstack2.wg1 = 192.168.3.3)
+OPENAI_BASE_URL=http://192.168.3.3:11434/v1 OPENAI_API_KEY=EMPTY pi
 ```
 
-### Switching models manually
+### Replacing the running container
 
 To serve a different model, stop the current container and start a new one:
author	Paul Buetow <paul@buetow.org>	2026-03-21 12:51:34 +0200
committer	Paul Buetow <paul@buetow.org>	2026-03-21 12:51:34 +0200
commit	16b23189a2bd0c237d73d3a024095fd2c7303695 (patch)
tree	f1cf0d92bec1a3c91137b931650938b5b33ec4e7 /README.md
parent	c5fda425df55f31faa49a74abdd20f9b8f8240ca (diff)