Hacker News new | ask | show | jobs
by thesz 720 days ago
This can be seen as encoding ranges that span each character. E.g., "asdoasd" needs to encode just 3 for 'o', remove 'o' from "asdoasd" and encode 0 and 2 for 'a', then 0 and 1 for 's' and then only "dd" remains - encode 'd', encode stoppage bit and you are done.

So I implemented that transformed idea in Haskell using rude approximation of some encoding of the integers and got, approximately, 5432177 bits for first 1e6 bytes of enwik9. This translates to 5.43 bits per byte, which is not that bad - order0 encoding gets about 5.56 bits per byte there. My approach can be improved, of course, and improved result can be even smaller.

So, in my not important opinion, it is a good approach. The only drawback I see is that I cannot fathom how to extend that encoder to higher order contexts.

PS

My variant adds counts of runs (count of character appearance), so it does add frequency table implicitly.