Hacker News new | ask | show | jobs
by briansun 310 days ago
Haha, a cute pet dragon. Two knobs that helped me tame VRAM: KV‑cache quant/eviction and sliding‑window attention (if your runtime supports them). What model/runtime and context are you running when it tips over? Are you using Ollama?