Hacker News new | ask | show | jobs
by busfahrer 43 days ago
I am considering a M5 Pro (18/20C) Macbook with 64GB of RAM, but I'm having a really hard time finding benchmarks of real world models:

Could somebody please provide some tokens-per-second numbers for example for Qwen 3.6 35B/A3B, specifically for Q4 and Q6 quants?

2 comments

My advice: don't just look at tokens per second, but also at time to first token (TTFT).

The local inference space is leaning to MoE models, and a lot of them have decent tokens / second, but horrible TTFT.

You can expect around 55-60t/s with Qwen3.5:35b-a3b or gemma4:26b-a4b Q4