Hacker News new | ask | show | jobs
by K0balt 900 days ago
If you want to run local, I’d get an m2 with 64gb of ram. That will enable you to run 30b models and mixtral 7bx8 . You need around 50gb to run those at 5/6 bit quant.

I’m getting about 20 tokens/second on my 64gb m2 mbp with mixtral 5-k-m gguf in llamacpp using text generation webui., 35? Layers being sent to metal for acceleration.

I’m really pleased with the performance compared to my dual 3090 desktop rig, the mbp is actually faster.