Hacker News new | ask | show | jobs
by tarruda 27 days ago
> 300B models at least fit in a single maxed out Mac Studio or a small stack of DGX Sparks or AMD Strix Halo boxes.

I run 2.54 BPW 397B Qwen 3.5 GGUF on a 128G mac studio at 20 tokens/second generation and 200 tokens/second processing. I'm not suggesting it matches the performance of the full BF16 model, but I did run some benchmarks locally and the results were pretty good:

- MMLU: 87.96%

- GPQA diamond: 86.36%

- IfEval: 91.13%

- GSM8k: 92.57%

So I think we have been at the "frontier capabilities at home" for a few months now.