Hacker News new | ask | show | jobs
by adamjc 926 days ago
Isn't that going to be extremely slow? I can only realistically run 7B 5-bit models on my RTX 3060, anything more and it offloads to the CPU. My responses go from almost-instantaneous to 3mins+.
2 comments

It seems like it's running at comparable speeds to GPT4 prior to Turbo. I could be wrong, but what I'm trying to say, it ain't bad at all.
This is where the Mac world shines.
would a 32gb M2 Max be able to run a 34b-model?