Hacker News new | ask | show | jobs
by AnotherGoodName 721 days ago
A better way to do this is plain old arithmetic encoding. It also effectively stores a real number of bits for each symbol which is where the savings here is coming from as you stated.

Ie. you'll get an even better result if you limit plain old arithetic coding to 1/70th probablility of each symbol (70 was stated as the non binary aligning example here) and just encode through using arithmetic encoding. Arithmetic encoding is blisteringly fast too.

https://en.wikipedia.org/wiki/Arithmetic_coding

I'd also say it's not really cheating Shannon limits. It's just that a binary channel likes to align on base two boundaries. Arithmetic coding is the way to store optimally without needing to align on the boundaries. With arithmetic coding at worst you waste 7 bits at the end of the file as you align to 8bits to store.