hypr - My "local" LLM setup with Hyperstack.

diff options

author	Paul Buetow <paul@buetow.org>	2026-03-21 22:39:00 +0200
committer	Paul Buetow <paul@buetow.org>	2026-03-21 22:39:00 +0200
commit	f5c2125d1c1cbf3adde917747aba61cbc3a0f228 (patch)
tree	99e6279a3c9825114d90993adccf7eb177d09a9d /hyperstack.rb
parent	0b4bbe047af8222ba0cc5d8100e2ef60ee8093bd (diff)

Fix nemotron-3-super vLLM OOM: cap context and add --enforce-eager

The [vllm] defaults had max_model_len=262144 without --enforce-eager, causing the vLLM container to OOM on startup (CUDA graph capture costs ~3-4 GB on top of ~60 GB nemotron weights on the A100 80GB). Also switch flavor to n3-H100x1 since n3-A100x1 is out of stock in CANADA-1. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Diffstat (limited to 'hyperstack.rb')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: