|
|
|
|
|
by codex
4981 days ago
|
|
While I believe the author's approach is the correct one, I am very curious to see a proof that an arithmetic encoding will always fall within the 1MB limit. The pigeonhole principle guarantees an average of slightly less than seven bits per number, but it doesn't follow that a particular encoding will always consume nearly the same number of bits--especially for the worst case input. Prefix codes are useful, but not that efficient when the probability density function is uniform and there are a lot of potential numbers to encode. Golomb coding works well on an exponential distribution, not a flat one. Does an adaptive encoding save the day here? |
|
A quick and dirty Python program computing the logarithm in base 2 of N in Python gives me 8093729.48175.
So, one can encode a number in the range [0, N) in 8093730 bits, or 1011717 bytes (possibly a few more due to rounding, but that is far enough from 1MB=1048756 to not worry about that)
Arithmetic coding replaces that [0,N) integer by a floating point number in [0,1) by dividing by 2^(8MB), but that does not make a difference.
So, there is plenty of room to store one such number. That leaves the problem of replacing one such number for K eight-digit numbers by one for K+1 eight-digit numbers. I'll leave that as an exercise :-) because you did not ask for it and because I cannot think of an easy proof.