diff options
| author | Paul Buetow <paul@buetow.org> | 2026-03-21 22:39:00 +0200 |
|---|---|---|
| committer | Paul Buetow <paul@buetow.org> | 2026-03-21 22:39:00 +0200 |
| commit | f5c2125d1c1cbf3adde917747aba61cbc3a0f228 (patch) | |
| tree | 99e6279a3c9825114d90993adccf7eb177d09a9d /hyperstack.rb | |
| parent | 0b4bbe047af8222ba0cc5d8100e2ef60ee8093bd (diff) | |
Fix nemotron-3-super vLLM OOM: cap context and add --enforce-eager
The [vllm] defaults had max_model_len=262144 without --enforce-eager,
causing the vLLM container to OOM on startup (CUDA graph capture costs
~3-4 GB on top of ~60 GB nemotron weights on the A100 80GB).
Also switch flavor to n3-H100x1 since n3-A100x1 is out of stock in
CANADA-1.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Diffstat (limited to 'hyperstack.rb')
0 files changed, 0 insertions, 0 deletions
