Hacker News new | ask | show | jobs
by dTal 906 days ago
>since decompression must be lossless, it's unclear to me if this approach could really help here.

All lossy compression can be made lossless by storing a residual. A modern language transformer is well suited to compression because they output a ranked list of probabilities for every token, which is straightforward to feed into an entropy encoder.[0]

On the other hand, LLMs expose the flaw in the Hutter prize - Wikipedia is just too damn small, and the ratio of "relevant knowledge" to "incidental syntactic noise" too poor, for the overhead of putting "all human knowledge" in the encoder to be worth it. A very simple language model can achieve very good compression already, predicting the next word almost as well as a human can in absolute terms. The difference between "phone autocomplete" and "plausible AGI" is far into the long tail of text prediction accuracy.

Probably a state of the art Hutter prize winning strategy would simply be to train a language model on Wikipedia only, with some carefully optimal selection of parameter count. The resulting model would be useless for anything except compressing Wikipedia though, due to overfit.

[0] A practical difficulty is that large language models often struggle with determinism even disregarding the random sampling used for ChatGPT etc, when large numbers of floating point calculations are sharded across a GPU and performed in an arbitrary order. But this can easily be overcome at the expense of efficiency.

1 comments

> On the other hand, LLMs expose the flaw in the Hutter prize

I'm not sure why it would be a flaw — isn't this more a sign of how universally interestint the rules are?

Out of curiosity, I didn't dive deeply into the previous winner's implementation, but are there NNs (+ error correction matrix, that's why I mentioned the FAQ in other comment) among them?