| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by antirez 541 days ago
	Just to clarify: even storing ranking, here would likely produce good results, but not as good as storing the probability, since it exploits better the ability of arithmetic coding to store this fractional intervals. But here the fundamental trick is that the LLM can compress the "next in sequence" information in a distribution that is much better to compress than the initial data itself.

1 comments

gliptic 541 days ago

This is especially true for instance when you have two or more tokens that are about equally likely, or one token that is virtually certain, which ranking would obscure.

link

antirez 541 days ago

Indeed.

link