Hacker News new | ask | show | jobs
by senko 426 days ago
I'm doing that with a 12GB card, ollama supports it out of the box.

For some reason, it only uses around 7GB of VRAM, probably due to how the layers are scheduled, maybe I could tweak something there, but didn't bother just for testing.

Obviously, perf depends on CPU, GPU and RAM, but on my machine (3060 + i5-13500) it's around 2 t/s.