Hacker News new | ask | show | jobs
by peter-ebert 719 days ago
Thanks I'll work on that. One thing though > The linked example can encode _any sequence_, not just ones that have an equal number of ones and zeros.

The compression I made can also encode any sequence of bytes, see the testinput folder (caveats in the FAQ are because of the implementation specifics not the algorithm), it just also has to store the freq table which I'm working on compressing too. It has no requirement for equal number of 1s and 0s, that was just an example in the description that I guess did more bad than good.

1 comments

I mean without storing the freq table, which the last example you linked doesn't. Having to store a frequency table limits the encoder to do no better than the 0-th order entropy. Arithmetic coding can do arbitrarily complex prediction per symbol, see e.g. http://mattmahoney.net/dc/text.html#1072 which reaches the world record by training a neural network to predict the next symbol while it's encoding a file.

Not to say a 0-th order entropy coder could not be useful for certain applications if it was e.g. faster than, say, a range coder while maintaining similar ratios. But I doubt that is possible with this method.

I've added basic bit packing to the frequencies table, so long as the dictionary doesn't dominate like in a pangram it's slightly smaller than two arithmetic (static and adaptive) examples I could find online. I understand this isn't the best that arithmetic can do, next I'll look into that.

Never was trying to compare to NNs or PPM (or even simple LZ), since all this algorithm does is look at the frequency counts.

Agree it's not very practical and very expensive calculations, just fun to work on.