|
|
|
|
|
by briansun
310 days ago
|
|
Haha, a cute pet dragon. Two knobs that helped me tame VRAM: KV‑cache quant/eviction and sliding‑window attention (if your runtime supports them). What model/runtime and context are you running when it tips over? Are you using Ollama? |
|