|
|
|
|
|
by busfahrer
43 days ago
|
|
I am considering a M5 Pro (18/20C) Macbook with 64GB of RAM, but I'm having a really hard time finding benchmarks of real world models: Could somebody please provide some tokens-per-second numbers for example for Qwen 3.6 35B/A3B, specifically for Q4 and Q6 quants? |
|
The local inference space is leaning to MoE models, and a lot of them have decent tokens / second, but horrible TTFT.