| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by halflings 122 days ago
	LLMs generate their output one token at a time. The first thought when you learn this is that this is a huge performance bottleneck, as we are used to highly parallelized systems. However, a large part of what makes LLMs feel so magical comes from this bottleneck.