Hacker News new | ask | show | jobs
by Galanwe 40 days ago
My advice: don't just look at tokens per second, but also at time to first token (TTFT).

The local inference space is leaning to MoE models, and a lot of them have decent tokens / second, but horrible TTFT.