| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gliptic 767 days ago
	I don't see why arithmetic coding wouldn't match this if the probabilities are suitably updated after every symbol. If you encode 8 bits, 4:4 ratio, after the first bit is encoded, either 1 or 0 has a remaining count of 3, etc. And when the count of either 0 or 1 reaches 0, the rest of the bits are known.

1 comments

peter-ebert 767 days ago

Here's a simpler example, 2 symbols 1:1 ratio, Shannon would say the entropy is 1 bit per symbol, so this needs 2 bits: 01 or 10 encode both permutations.

However I can also just store 1 or 0 to indicate what's stored in the first position, using only a single bit, and the next value is inferred.

link

gliptic 767 days ago

With arithmetic coding this requires 1 bit to store: the value of the first bit. The other is known with 100% probability. This is very simple to implement, especially with bit-wise arithmetic coding.

The problem is that the Shannon limit doesn't apply to your example. A fixed 1:1 ratio bit string is not I.I.D.

link

ajb 767 days ago

This. Optimal use of an arithmetic encoder is with P(symbol | all previous symbols), not just P(symbol). Obviously there are practical problems with storing all possible contingency tables, so for something like HEVC there is quite a lot of effort in identifying where this matters.

link

foobarqux 767 days ago

You are saying that conditional on some additional information (or structure) you can get better compression. But that is saying something tautological: conditional on the entropy being lower the entropy is lower. In your case you are assuming you already know the symbol frequencies and you don’t count that as information. However you are comparing against a model where that is not assumed.

link

AnotherGoodName 767 days ago

That's just regular data compression.

If i had a text that was 100% 'aaaa' or 'bbbb' or 'cccc' with equal probability I'd feed 1/3rd probability aaaa, 1/3rd probability bbbb, 1/3rd probability cccc into an encoder. In this case since the probabilities are not binary numbers i'd use an arithmetic encoder to optimally compress this down to ~1.58 bits per symbol. So 'aaaa' would take 1.58bits to store as would 'bbbb' and 'cccc'

link

quag 767 days ago

Wouldn’t two symbols with a 1:1 ratio have four possible bit patterns for two bits? (00,01,10,11) With the ratio only happening on average over a large number of bits?

Id there really could only be 01 or 10, then those are the two symbols in the alphabet, and you only need one bit to pick the next symbol (two bits of output).

link

minitech 767 days ago

Those two values aren’t independent at all.

link