Hacker News new | ask | show | jobs
by tshaddox 530 days ago
This is obviously relevant to the Hutter Prize, which is intended to incentivize AI research by awarding cash to people who can losslessly compress a large English text corpus:

https://en.wikipedia.org/wiki/Hutter_Prize

From a cursory web search it doesn't appear that LLMs have been useful for this particular challenge, presumably because the challenge imposes rather strict size, CPU, and memory constraints.

3 comments

This compressor is by a certain Fabrice Bellard, an overactive overachiving powerhouse of a programmer who happens to be leading the Large Text Compression Benchmark https://www.mattmahoney.net/dc/text.html, which is maintained by Matt Mahoney who happens to run the Hutter Prize :)

Fabrice also makes some programs you might use, like FFMEG and QEMU

https://bellard.org/

Fabrice hasn't worked on ffmpeg for a very long time now, the main developer is Michael Niedermayer. (Or was last time I checked.)
> FFMEG

For those unaware, it's a typo. willvarfar meant FFMPEG.

> Fabrice also makes some programs you might use, like FFMEG and QEMU

This would be the one sentence that wouldn't cause me to look down on somoene, if used as a third-person humble-brag.

it's simpler: the hutter prize imposes a 110M constraint on the sum of the sizes of your program (including any data it needs to run) and the compressed data

llms are generally large

This could be circumvented by _training_ the LLM on the fly on the previously observed file data. This is what Bellard's other NN compressor, nncp, does [1], which is currently #1 on Mahoney's benchmark [2]. Unfortunately this is too slow, especially running on the CPU as Hutter's challenge stipulates IIRC.

[1] https://bellard.org/nncp/

[2] http://mattmahoney.net/dc/text.html

In fact, pretty much every adaptive compression algorithm does. The eventual compression ratio would thus be determined by the algorithm (nncp, cmix, ...; also includes smaller tweaks like those typically made by the Hutter Prize winners) and its hyperparameters.
Yes, the only exception is dictionaries used in preprocessing, but I think that's mostly a tradeoff to reduce the runtime.
More because lossy compression is what's been analogized to intelligence and this prize is doing a sleight of hand to insert lossless compression as if that doesn't make a difference. That's more why LLMs aren't really all that useful.
I wouldn't call that a sleight of hand. Surely better lossy compression can be trivially used to implement better lossless compression, and the latter is just much easier to quantify for a benchmark.
Not a single lossless compression technique I’m aware of starts of in lossy compression.

They have different goals and utilize completely different techniques.

At most lossy techniques leverage lossless techniques (eg to compress non-perceptual binary headers) not the other way round.

Here's the submission that won the Hutter Prize in 2021: https://github.com/amargaritov/starlit It uses a LSTM to predict the next token lossily, then uses https://en.wikipedia.org/wiki/Arithmetic_coding to convert that to lossless compression. Lossless compression can definitely leverage a lossy compressor, such as via arithmetic coding. Also see: https://en.wikipedia.org/wiki/Context-adaptive_binary_arithm... which has a simple "Example" section - imagine if the top prediction made by your neural network was correct, you emit "0", if the 2nd was correct, you emit "10", if the 3rd, "110", if the 4th, "1110". As you can see, this is lossless, but the fundamental prediction is lossy, and the better that prediction is, the better the compression. (In actuality, you wouldn't waste your 1 bits like this, you'd use arithmetic coding instead).
Yes, one way to think about arithmetic compression is encoding the difference between the prediction and reality.

This isn't normally what people mean by lossy compression, though. In lossy compression (e.g. mainstream media compression like JPEG) you work out what the user doesn't value and throw it away.

That’s a stretch to call it lossy. To my eye the purpose of the LSTM seems indistinguishable from a traditional compression dictionary.

And that still doesn’t show how lossless compression is tied to intelligence. The example I always like to give is, “What’s more intelligent? Reciting the US war of independence Wikipedia page verbatim every time or being able to synthesize a useful summary in your own words and provide relevant contextual information such as it’s role in the French Revolution?”

These lossless compression algorithms don’t just recall a fixed string of text. Obviously that can be accomplished trivially by simply storing the text directly.

These lossless compression algorithms compress a large corpus of English text from an encyclopedia. The idea is that you can compress this text more if you know more about English grammar, the subject matter of the text, logic, etc.

I think you’re distracted by the lossless part. The only difference here between lossy and lossless compression is that the lossy algorithm also needs to generate the diff between its initial output and the real target text. Clearly a lossy algorithm with lower error needs to waste fewer bits representing that error.

This is standard lossless compression. None of the concepts particular to lossy compression (like rate-distortion theory) are used.