| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by hskalin 426 days ago
	With ollama you could offload a few layers to cpu if they don't fit in the VRAM. This will cost some performance ofcourse but it's much better than the alternative (everything on cpu)

2 comments

senko 425 days ago

I'm doing that with a 12GB card, ollama supports it out of the box.

For some reason, it only uses around 7GB of VRAM, probably due to how the layers are scheduled, maybe I could tweak something there, but didn't bother just for testing.

Obviously, perf depends on CPU, GPU and RAM, but on my machine (3060 + i5-13500) it's around 2 t/s.

link

dockerd 425 days ago

Does it work on LM Studio? Loading 27b-it-qat taking up more than 22GB on 24GB mac.

link