|
|
|
|
|
by tarruda
27 days ago
|
|
> 300B models at least fit in a single maxed out Mac Studio or a small stack of DGX Sparks or AMD Strix Halo boxes. I run 2.54 BPW 397B Qwen 3.5 GGUF on a 128G mac studio at 20 tokens/second generation and 200 tokens/second processing. I'm not suggesting it matches the performance of the full BF16 model, but I did run some benchmarks locally and the results were pretty good: - MMLU: 87.96% - GPQA diamond: 86.36% - IfEval: 91.13% - GSM8k: 92.57% So I think we have been at the "frontier capabilities at home" for a few months now. |
|