| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by adamjc 926 days ago
	Isn't that going to be extremely slow? I can only realistically run 7B 5-bit models on my RTX 3060, anything more and it offloads to the CPU. My responses go from almost-instantaneous to 3mins+.

2 comments

It seems like it's running at comparable speeds to GPT4 prior to Turbo. I could be wrong, but what I'm trying to say, it ain't bad at all.

This is where the Mac world shines.

would a 32gb M2 Max be able to run a 34b-model?