Hacker News new | ask | show | jobs
by gliptic 530 days ago
This could be circumvented by _training_ the LLM on the fly on the previously observed file data. This is what Bellard's other NN compressor, nncp, does [1], which is currently #1 on Mahoney's benchmark [2]. Unfortunately this is too slow, especially running on the CPU as Hutter's challenge stipulates IIRC.

[1] https://bellard.org/nncp/

[2] http://mattmahoney.net/dc/text.html

1 comments

In fact, pretty much every adaptive compression algorithm does. The eventual compression ratio would thus be determined by the algorithm (nncp, cmix, ...; also includes smaller tweaks like those typically made by the Hutter Prize winners) and its hyperparameters.
Yes, the only exception is dictionaries used in preprocessing, but I think that's mostly a tradeoff to reduce the runtime.