| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by binarybanana 1781 days ago

There are ways to make it work. The simplest would be to just not do any splaying most of the time. No splaying means no locking is necessary, just an atomic read to check if anything changed. Apparently it's actually good for performance[1] to splay only occasionally instead of on every access. Another would be to use atomics to do lockless updates. Probably best to only do partial splaying, or only occasionally like above, but it's doable[2].

If the data is actually stored in an array and the tree structure is actually just a map that stores the "nodes" as a tuple (left, element, right) in another array, where left, element and right are indexes into the backing array, then you could just have one such map per thread and only had to use synchronization if anything is added or deleted. Or even have purely thread local addition/deletion if that is sufficient and no locking at all is desirable. A multi-splay tree essentially.

Storing nodes as elements in an array is probably a good idea for performance in general, even in the single threaded case. It means you don't have to allocate each element individually and that things are stored in a contiguous block of memory. The elements are not going to be in the correct order since they fixed in place in the backing array, but they're still going to be much more local than if they're allocated individually. Allocating from an arena/bump allocator would have similar advantages though.

You could also store everything in a single array, replacing the node's element index with the data itself and then use atomics again. This is how I've seen it done in the wild, just without the atomics. Really depends on what you're doing, though. If the individual items are small because they mostly store pointers into some other heap allocated data or just a bunch of numbers, so that the whole map+element structure could fit in the cache, then using multiple maps probably gives better performance.

Can't find any references for this because I just came up with the idea of using multiple maps (not the array storage itself) into a single backing array. Multi-splay trees[3] look close in principle, but they use a reference tree instead of a flat array. It's probably not substantial enough of a difference to really call it a different data structure though.

Edit: Another idea would be a delayed queue splay tree, where some kind of reference to the accessed node gets added to a queue and once the queue reaches a certain size, or some other condition is met, the tree gets locked and all the splaying is done in a single batch. Some operations could even be optimized away if computing that is less costly than doing the operations itself Is. In that case it might actually be useful even in the single threaded case. Somewhere relates to the randomized splaying and probably has been done before though.

[1]: http://www14.in.tum.de/personen/albers/papers/ipl02.pdf

[2]: https://digitalscholarship.unlv.edu/cgi/viewcontent.cgi?arti...

[3]: https://www.cs.cmu.edu/~jonderry/multi-splay.pdf

1 comments

scottlamb 1780 days ago

Those techniques sound possible but the path hint approach seems so much simpler to apply than batching splay updates.

Particularly with epoch GC: When there's heavy reading and rare writes, I like using epoch GC so readers never block. Writes then need to create a whole new map. I suppose one could use something like the delayed queue splay tree you're describing, publishing a new map occasionally, and ensuring this doesn't race with other updates. But it'd be a pain to get right when you could just use path hints instead. Is there an overwhelming benefit I'm missing to justify this complexity and the expense of full map rebuilds?

link