| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by yorwba 279 days ago
	In particular, part of the paper is about dynamically adjusting the number of tokens generated in parallel while maintaining roughly the same output quality as one-token-at-a-time decoding. The other part is about the KV caching strategy they use to speed up parallel decoding further.