| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by doppelgunner 311 days ago
	Tried running a local LLM and it felt like adopting a pet dragon. Fun at first, but then it keeps eating all my GPU and still refuses to clean up its own context window.

1 comments

briansun 310 days ago

Haha, a cute pet dragon. Two knobs that helped me tame VRAM: KV‑cache quant/eviction and sliding‑window attention (if your runtime supports them). What model/runtime and context are you running when it tips over? Are you using Ollama?

link