Hacker News new | ask | show | jobs
by AnotherGoodName 720 days ago
That's just regular data compression.

If i had a text that was 100% 'aaaa' or 'bbbb' or 'cccc' with equal probability I'd feed 1/3rd probability aaaa, 1/3rd probability bbbb, 1/3rd probability cccc into an encoder. In this case since the probabilities are not binary numbers i'd use an arithmetic encoder to optimally compress this down to ~1.58 bits per symbol. So 'aaaa' would take 1.58bits to store as would 'bbbb' and 'cccc'