summaryrefslogtreecommitdiff
path: root/hyperstack.rb
diff options
context:
space:
mode:
authorPaul Buetow <paul@buetow.org>2026-03-21 22:39:00 +0200
committerPaul Buetow <paul@buetow.org>2026-03-21 22:39:00 +0200
commitf5c2125d1c1cbf3adde917747aba61cbc3a0f228 (patch)
tree99e6279a3c9825114d90993adccf7eb177d09a9d /hyperstack.rb
parent0b4bbe047af8222ba0cc5d8100e2ef60ee8093bd (diff)
Fix nemotron-3-super vLLM OOM: cap context and add --enforce-eager
The [vllm] defaults had max_model_len=262144 without --enforce-eager, causing the vLLM container to OOM on startup (CUDA graph capture costs ~3-4 GB on top of ~60 GB nemotron weights on the A100 80GB). Also switch flavor to n3-H100x1 since n3-A100x1 is out of stock in CANADA-1. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Diffstat (limited to 'hyperstack.rb')
0 files changed, 0 insertions, 0 deletions