| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by josephg 787 days ago

You can use LLMs as compressors, and I wonder how it would go with that.

The approach is simple: Turn the file into a stream of tokens. For each token, ask a language model to generate the full set of predictions based on context, and sort based on likelihood. Look where the actual token appears in the sorted list. Low entropy symbols will be near the start of the list, and high entropy tokens near the end.

I suspect most language models would deal with your alphabet example just fine, while still correctly spotting passwords and API keys. It would be a fun experiment to try!