| Age | Commit message (Collapse) | Author |
|
Model weights occupy ~73.6 GiB leaving ~5.6 GiB for KV cache. Reduce
max_model_len to 32768 and raise gpu_memory_utilization to 0.98 to fit.
Add --enforce-eager to disable CUDA graph capture, which profiling-phase
requires ~2 GiB headroom that simply isn't available on a single A100.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
- Add --reasoning-parser openai_gptoss to gpt-oss-120b vLLM config in
all three toml files; extracts <|channel|>analysis thinking blocks
into reasoning_content in API responses
- Mark gpt-oss-120b as reasoning: true in pi/agent/models.json for all
three providers (hyperstack, hyperstack1, hyperstack2)
- Update vm1 state file
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
- Add hyperstack-vm1-gptoss.toml: A100x1 config for gpt-oss-120b (VM1)
and qwen3-coder-next (VM2) pair, replacing the H100x2 default
- Fix pi/agent/models.json: hyperstack provider URL was pointing at
hyperstack.wg1 (unresolvable); corrected to hyperstack1.wg1 (192.168.3.1)
- Update hyperstack.rb, hypr.fish: reference vm1-gptoss.toml for create-both
and pair commands; update fish abbrs for the new pair setup
- Update ask-mode/utils.ts: allow read-only 'ask' commands in ask-mode
- Update agent-plan-mode/utils.ts: tighten isAskCommand check
- Add state files for provisioned vm1/vm2 instances
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|