Hacker News new | ask | show | jobs
by steve_adams_86 649 days ago
I know it's a fraction of the size, but my 32GB studio gets wrecked by these types of tasks. My experience is that they're awesome computers in general, but not as good for AI as people expect.

Running llama3.1 70B is brutal on this thing. Responses take minutes. Someone running the same model on 32GB of GPU memory seems to have far better results from what I've read.

1 comments

You are probably swapping. On M3 max with similar memory bandwidth the output is around 4t/s which is normally on par with most people's reading speed. Try different quants.
I'm on an M2 max so I shouldn't be too far behind. I'm not actually sure how the model I'm using was quantized to be honest. I'll give it a try.