|
|
|
|
|
by recon517
4986 days ago
|
|
Store each number as a delta from a previous one, first number as a delta from 0. Each number starts with a code describing its length: 0 (duplicate), 7, 9 or 27 bits. For storing a code use values '1', '01', '001' and '0001'. Calculate statistics for different code types and when space is tight replace most common codes with shortest ones. |
|
If you try that encoding scheme with some random data you'll find it won't always fit.
I took 1,000,000 random numbers between 0 and 99,999,999 and sorted them and calculated the deltas. Here's the distribution of the sizes of the deltas:-
That's the total number of bits required to store just the deltas, but doesn't include the length encoding.810241024 - 6408685 = 1979923 spare bits.
Each of the 999,999 deltas will take at least one bit to determine the size of the delta, leaving you with 979,924 bits for extra length encoding and wastage (if you pack multiple lengths under one encoding).
There's no way I can see (after trying lots of permutations) to be able to encode all of those bunches of diffs using those few remaining bits.
What's even more difficult is that you don't have the space to 'calculate statistics for different code types' because you can't fit anywhere near all of the deltas into that 1MB of memory without having them encoded perfectly anyway. Calculating them on partial data is all that you can do, and that's not going to be accurate enough because you've no idea what the unseen data has to hold (it could all be duplicates or 11 bit differences...).
I'm keeping going on looking at this (and trying not to look at any of the posted solutions).