| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ani17 113 days ago
	The blog walks through why your first token is always the slowest, why output tokens cost 5x more, and how stuff like speculative decoding and chunked prefill actually work, from the perspective of a systems engineer!