summaryrefslogtreecommitdiff
path: root/hyperstack-vm1-gptoss.toml
AgeCommit message (Collapse)Author
2026-03-26hyperstack: tune nemotron-super preset for single A100-80GBPaul Buetow
Model weights occupy ~73.6 GiB leaving ~5.6 GiB for KV cache. Reduce max_model_len to 32768 and raise gpu_memory_utilization to 0.98 to fit. Add --enforce-eager to disable CUDA graph capture, which profiling-phase requires ~2 GiB headroom that simply isn't available on a single A100. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24gpt-oss-120b: enable reasoning via openai_gptoss parserPaul Buetow
- Add --reasoning-parser openai_gptoss to gpt-oss-120b vLLM config in all three toml files; extracts <|channel|>analysis thinking blocks into reasoning_content in API responses - Mark gpt-oss-120b as reasoning: true in pi/agent/models.json for all three providers (hyperstack, hyperstack1, hyperstack2) - Update vm1 state file Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24hyperstack: gpt-oss-120b + qwen3-coder-next dual-VM pair on A100x1Paul Buetow
- Add hyperstack-vm1-gptoss.toml: A100x1 config for gpt-oss-120b (VM1) and qwen3-coder-next (VM2) pair, replacing the H100x2 default - Fix pi/agent/models.json: hyperstack provider URL was pointing at hyperstack.wg1 (unresolvable); corrected to hyperstack1.wg1 (192.168.3.1) - Update hyperstack.rb, hypr.fish: reference vm1-gptoss.toml for create-both and pair commands; update fish abbrs for the new pair setup - Update ask-mode/utils.ts: allow read-only 'ask' commands in ask-mode - Update agent-plan-mode/utils.ts: tighten isAskCommand check - Add state files for provisioned vm1/vm2 instances Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>