Y
Hacker News
new
|
ask
|
show
|
jobs
by
adamjc
926 days ago
Isn't that going to be extremely slow? I can only realistically run 7B 5-bit models on my RTX 3060, anything more and it offloads to the CPU. My responses go from almost-instantaneous to 3mins+.
2 comments
_boffin_
926 days ago
It seems like it's running at comparable speeds to GPT4 prior to Turbo. I could be wrong, but what I'm trying to say, it ain't bad at all.
link
bdavbdav
926 days ago
This is where the Mac world shines.
link
lamtung
923 days ago
would a 32gb M2 Max be able to run a 34b-model?
link