Hacker News new | ask | show | jobs
by _game_of_life 1687 days ago
I'm far from an expert in this subject but doesn't this ranking of large text compression algorithms with NNCP coming first suggest that neural-nets are pretty great at compression?

http://mattmahoney.net/dc/text.html

https://bellard.org/nncp/

I don't see examples of high performing symbolic AI based compression algorithms anywhere, but again I am very ignorant, do you have examples?

1 comments

The ranking criteria of this list make it very unrepresentative of compressors used in the real world. The benchmark they’re using for example is the sum of the compressed file plus the compressor binary: this penalizes memorization of the evaluation text in the compressor binary itself. But in the real world, you would have no concerns at all that your compressor is “cheating” by working too well only for your particular data — having useful priors that model real-world data for more compact representations is the whole point. Many of these algorithms are also impractical due to speed or memory use. Ask yourself: How many of the top-10 algorithms do you have installed right now, or even recognize? The winners aren’t dominating outside the arena of this list.

I’m also not an expert in symbolic AI — my comment above is more about neural vs. pre-neural NLP methods, rather than symbolic AI, which I admit drifts a bit from the parent. A compressor replacing word tokens with dictionary indices is definitely symbolic but it’s not especially “AI”.