Hacker News new | ask | show | jobs
by ninepoints 914 days ago
It does? Feels like an O(n) scan every time you need to query the children of a node is a nonstarter for most applications (the readme is strangely optimistic about this point). Plenty of cache friendly tree implementations out there, and this one seems actually cache hostile with few benefits in my mind aside from ease of implementation.

Also, I write a lot of code that runs on a gpu, and the claim that this tree is somehow gpu friendly I find particularly dubious.

4 comments

> Feels like an O(n) scan every time you need to query the children of a node is a nonstarter for most applications (the readme is strangely optimistic about this point).

The trick is that querying the children of N different nodes is usually still a single O(N) scan, so if you operate on it array-style (which APL heavily encourages anyway), it's amortized constant time. Of course that's not always viable, but APL programmers tend to be surprisingly good at making array operations out of things you wouldn't expect to use arrays for.

> cache hostile

If you additionally enforce that all parent indexes point to lower numbers, a preorder traversal is a linear scan forward, and a postorder traversal with child order reversed (which you can usually correct for one way or another) is a linear scan backward.

(This assumes you only need dependency ordering, ie the parent node uses or supplies data to/from its children; if you need a true sequential traversal, the array has to be sorted according to that traversal (but is still a valid Apter Tree).)

> the claim that this tree is somehow gpu friendly I find particularly dubious

Yeah, array programming is generally kind of hit-or-miss at that and this does look like a miss.

Whenever somebody says cache friendly without additional context I assume they got code from somebody else without understanding what makes things "cache-friendly".
What?

I mean, there are specific sizes that will fit in the L-(N) cache.

And consequently, sizes that don't suffer from the false-sharing problem in concurrent scenarios due to this.

https://en.cppreference.com/w/cpp/thread/hardware_destructiv...

There's an entire field of study devoted to cache-friendly/cache-oblivious data structures:

- https://www.microsoft.com/en-us/research/wp-content/uploads/...

- https://rcoh.me/posts/cache-oblivious-datastructures/

The linear scan you are talking about I don't think gives you any sort of ordered traversal right? Unless I'm missing something.
For a arbitrary Apter Tree, a linear scan is unordered. You can impose additional constraints to get a ordered traversal (in the same way that, eg, you can sort a assoc-list/JSON-style key-value table by keys to get a key-order traversal), and the result is still a valid Apter Tree (respectively valid list of key-value pairs).
Yes but that is not what is presented (a B+ tree is not a B tree even with minor modifications) and it changes the complexity of your other update operations drastically. The thing that grates me (as someone that has written a dozen or so different tree structures) is that this one is presented as a particularly good one, and I think it excels at almost nothing, hence its obscurity.
For me, N is small. Its also N-ary, not binary, which crosses off a bunch of the first options. Anyway, I am not sure this will work, just worth trying. Empirical numbers beat theory every time :)
You are using N in a different sense than I am. Unless I'm reading the tree description incorrectly, N is the size of the tree itself, not the number of children.
Oh, I was being sloppy and mixed N into the ariness: I meant N elements, each with a variable number of children (as many as 8).
I would hazard a guess that a regular n-ary tree would outperform the OP tree in many usage scenarios with no extra effort, and with a number of B+ tree variants being strictly better at the cost of more effort.
There are use cases where that doesn’t matter, such as some compilers where it makes a full pass over the source code for every optimisation type.
Vector programming requires you to change your way of thinking; instead of computing something for 1 element, you compute it for N elements.
I do lots of simd and shader programming, but regardless of register width, O(n) is not O(1)
The point is that you shouldn't try to get all the children of a single node, but rather all the children of all the nodes, which is still O(n) and not O(n^2).
This doesn't make any sense to me.
I don't think looking at asymptotic behavor makes a lot of sense in situations where n is small and bounded. Big O says nothing about such cases.
Sorry, do you not have trees for which the size of the tree is large. Do all your trees fit inside a few cache lines of storage?
I deal with very large tree structures (~100 GB) in my search engine, but even there the dominant factor for performance isn't big O, but reducing block reads, access patterns and keeping relevant data in the disk cache.

Big O isn't irrelevant, but it is not the full story either. There's a solid reason why hash tables are a thing in memory but aren't really a thing on disk.

Do you understand the data structure being proposed in the original post, and are you claiming that scanning 100GB of data every time you want to perform a childof operation is acceptable? Please, use the proposed tree for your application since big o isn't the full story to you lol