Hacker News new | ask | show | jobs
by quicktwo 1575 days ago
I was thinking that it's probably not quite sparse enough to benefit from RLE as-is, since the number of bits you'd need for your lengths would outstrip the length of your run. If any run can be more than 128 words, then you'd need at least 8 bits for the run, making it only beneficial for runs of longer than that.

An alternative would be to make 0 mean five zeros (or some other N) and then if you hit a 1, it means the next 5 bits are to be interpreted as-is. This reduces all 5 length 0s to 1 bit, while only adding 1 bit whenever there is a bit. At worst this introduces 1 extra bit per answer. The answer to non-answer ratio is about 5 to 1, so this should definitely save space while also having a trivial decoding algorithm.

1 comments

Probably not suitable for direct RLE as you say, but if you looked over the data I suspect you might find that a stepped RLE, where you interleave two or more RLEs could provide a savings. You do of course need a cache to decompress parts of the intervleaved RLE'd data into.