| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Galanwe 40 days ago
	My advice: don't just look at tokens per second, but also at time to first token (TTFT). The local inference space is leaning to MoE models, and a lot of them have decent tokens / second, but horrible TTFT.