| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ant6n 575 days ago

You kind of lost me towards the end, so I’m not sure whether you attempted it, but I was wondering whether it would be possible for the internal nodes to store only the upper 8/16 bits (that are nonzero for the current subtree). This implies that one 64 byte cache line stores 32 or 64 entries instead of 16 (or better: 31 or 62/63, since u may need some bookkeeping dat).

The next level would be to keep track of how many prefix bits are implied by the parents, so leaf nodes could perhaps also only use 8/12/16 bits, if the the higher bits are implied by the parent. or instead of bit masks, use offsets (i.e. leaves store k bits offset from parent value).

That may screw around with the branch predictor, and may work very good with evenly distributed data vs uneven data.

1 comments

curiouscoding 575 days ago

Ohh yes some good ideas there! I've thought about pretty much exactly this at some point but then didn't end up including it. But I do think it's quite promising, and most of the bookkeeping can be made to be branchless.

On the other hand, most of the latency is in the last few layers, and probably there isn't as much to be saved there.

The biggest problem might be that the bottom layer will anyway have to store full-width numbers, since we must be sure to have the low-order bits somewhere in case they weren't covered yet in earlier layers. Or we could have variable width encoding per node maybe (instead of per layer) but that does sound a bit iffy on the branch predictor.

In the end I guess I kinda like the simplicity and hence reliability of the 'just do the full tree and the first layers are cheap anyway' approach. Probably another factor 2 speedup is possible on specific datasets, but anything beyond this may not be reliably good on worst case inputs (like is the issue for the prefix partitioning methods).

ant6n 574 days ago

The lowest level could be encoded with whatever number of bits is required for the numeric distance between the leafs. With proper book keeping, the full 32 bit values don’t need to be stored. The value for each node should be something like (parent << shift + node), where node has ideally 8 Bits, or maybe 16 bits.

It kind of comes down to how to deal with bad distribution of values, like large differences between adjacent values. One could for example deal with this by deliberately leaving „gaps“ in the tree, like using a 59-ary node on places (make the last values in the array large, so they will never get visited). With dynamic programming, perhaps an optimal distribution of the leaf level could be had - although it requires more bookkeeping bits of one needs to have the index of the found value, not just whether the value is in the tree.

The question could also be whether it could be interesting to store delta valeues, so that 63 8-bit valeues gives a larger numeric range - this means computing some sort of cumulative sum on on the 63 values, not sure whether there is a simd instruction for that.

One more thought: if there is different ways different layers of the tree are stored, but they are always the same for each layer, the code be unrolled, with every layer having different code to deal with it.

Last thought: if the index is not important to know, just whether the value is in the set or not, one could use a bitmap as a data structure. It would require 256MB for the 32bit space, but large spans of zeros imply empty pages.