Hacker News new | ask | show | jobs
by bryondowd 1810 days ago
Interesting concept, but seems like it only works for fairly small arrays, right?

You could fit 40 base-3 values into a 64 bit integer, but in order to read/write index 39, you have to perform 39 multiplications/divisions. So no efficient random access. And is there any way to encode more than 40 base-3 values if your largest integer is 64 bits? If you need 45 elements, I guess you could use an array of 5 8-bit integers where each 8-bit element represents 5 base-3 elements. Of course, now you're limited to increasing the size of your array to increments of 5 (or more generally, smallest efficient integer size / log base 2 of the number of representable values)

1 comments

> Interesting concept, but seems like it only works for fairly small arrays, right?

Any size array will work.

> And is there any way to encode more than 40 base-3 values if your largest integer is 64 bits?

Yes, absolutely. And with ease. Bignums make it seamless. Some languages like Perl and Python and Mathematica support them out of the box.

You can still do random access, it just becomes: divide bignum by 3^n, take result mod 3. It will be costly though, like you mentioned. There is a way to bound the cost.

You can chunk your elements to a number of bits that is a multiple of 8 bits, a byte. This allows you to do random access with minimal overhead, while giving you the savings, bounded by your choice of chunk size.

For example, 100 of your elements can be neatly fit into 20 bytes.

101 elements would need more than 20 bytes or 160 bits, but 100 elements only needs 158.5 bits, we only lose 1.5 bits per chunk.

https://www.wolframalpha.com/input/?i=log2%283%29*100.+vs+lo...

Consider that we do this chunking. Let us say we have 10000 elements and want to look at the value of element 6000. We can immediately skip to byte (6000/100)*20=1200 and simply take the value there mod 3. This is possible to do because each chunk is no longer related to eachother. So we get random access with a small procedure that gives us a huge space savings.

If we choose a byte-sized chunk size, we can graph (log2(3)*n) % 8 to look for which chunk sizes would lose the least bits of storage.

https://www.wolframalpha.com/input/?i=discrete+plot+%28log2%...

We want to choose chunk sizes where the mod value is close to 8, since we have to round up.

Hope this helps.