Hacker News new | ask | show | jobs
by card_zero 491 days ago
Yet these aren't offsets from an address, that is, array indexes?
1 comments

But the idea is still the same? You can make "smart pointers" that are attached to arenas or whatever you want to call them. On those, the addressable size of the pointer is confined to how large the arena is.
Eh, you underestimate my naivety. To rephrase: are these just array indexes, then?

I mean I presume it's something cleverer than just "let's bit-pack our indexes to save space".

..."and make it dynamic" (arenas)

Oh wait, variable-size tiny pointers. Yes. Now it's getting satisfyingly complicated. Maybe like how UTF-8 works?

> To rephrase: are these just array indexes, then? I mean I presume it's something cleverer than just "let's bit-pack our indexes to save space".

The pointers can be dereferenced into an array index (when given the key), but they are smaller in size than a "bit-packed index", although I'm not quite sure what you mean by this. If you want to to represent an index into an array of size `n` there's no representation where all indexes take less than log(n). You can make some of them smaller, but then you make other bigger (like in UTF-8). The Tiny Pointers are all smaller than log(n) (and some of them would be exactly equal, despite representing different indexes!), but the only way of determining the actual index is to dereference it together with original key that was used to create it.

I mean, in a way, a pointer is just an array index into heap space? Such that, I think this is accurate? I'd drop the "just", though. They can be thought of that way, but the idea is how to keep the programmer from having to do that manually.

To rephrase, this is how to mechanically bit-pack indexes to save space. Doing it in a pointer means you are likely wanting to share these pointers with other parts of the code, so you also need to have them be self contained. Naive way would be for each pointer to be a tuple identifying arena and having its offset. But there is still a lot of effort that has to be done for that to work well. This paper looks at that effort and gives a pass at how worthwhile it is.

How is this novel?

This is like using a fixed sharding scheme plus delta-compression for indices relative to shard addresses plus varint encoding the deltas.

After reading more, it seems the novelty is in the two specific schemes that use these techniques to store data with high probability into tables of extremely high load factors with bounded pointer sizes, which are more complex than simple array indices due to having to resolve which fallback table is storing the key-value pair.

One scheme proves tiny pointer size bounds for fixed length tiny pointers. The other proves bounds for variable length pointers.