| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jeremysalwen 1199 days ago

Generative models (like LLMs) that assign probabilities to pieces of data are equivalent to compression algorithms.

To convert a generative model into a compression algorithm, you just use arithmetic coding: https://en.wikipedia.org/wiki/Arithmetic_coding.

To convert a compression algorithm into a generative model, you assign a probability to each piece of data according to the size of its compressed representation.

See also the Hutter Prize and associated FAQ: http://prize.hutter1.net/

If you wanted to specifically measure the "useful" information, you would need to have some way of sampling from the set of possible articles that contain the same "useful" information, but vary in the "useless" information, and vice versa. I think you would find that it would be difficult for you to define what the boundary is, but if you made some arbitrary choice, you could measure what you are looking for through the LLM probabilities.