| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ninepoints 914 days ago
	It does? Feels like an O(n) scan every time you need to query the children of a node is a nonstarter for most applications (the readme is strangely optimistic about this point). Plenty of cache friendly tree implementations out there, and this one seems actually cache hostile with few benefits in my mind aside from ease of implementation. Also, I write a lot of code that runs on a gpu, and the claim that this tree is somehow gpu friendly I find particularly dubious.

4 comments

a1369209993 914 days ago

> Feels like an O(n) scan every time you need to query the children of a node is a nonstarter for most applications (the readme is strangely optimistic about this point).

The trick is that querying the children of N different nodes is usually still a single O(N) scan, so if you operate on it array-style (which APL heavily encourages anyway), it's amortized constant time. Of course that's not always viable, but APL programmers tend to be surprisingly good at making array operations out of things you wouldn't expect to use arrays for.

> cache hostile

If you additionally enforce that all parent indexes point to lower numbers, a preorder traversal is a linear scan forward, and a postorder traversal with child order reversed (which you can usually correct for one way or another) is a linear scan backward.

(This assumes you only need dependency ordering, ie the parent node uses or supplies data to/from its children; if you need a true sequential traversal, the array has to be sorted according to that traversal (but is still a valid Apter Tree).)

> the claim that this tree is somehow gpu friendly I find particularly dubious

Yeah, array programming is generally kind of hit-or-miss at that and this does look like a miss.

toasted-subs 914 days ago

Whenever somebody says cache friendly without additional context I assume they got code from somebody else without understanding what makes things "cache-friendly".

gavinray 913 days ago

What?

I mean, there are specific sizes that will fit in the L-(N) cache.

And consequently, sizes that don't suffer from the false-sharing problem in concurrent scenarios due to this.

https://en.cppreference.com/w/cpp/thread/hardware_destructiv...

There's an entire field of study devoted to cache-friendly/cache-oblivious data structures:

- https://www.microsoft.com/en-us/research/wp-content/uploads/...

- https://rcoh.me/posts/cache-oblivious-datastructures/

ninepoints 914 days ago

The linear scan you are talking about I don't think gives you any sort of ordered traversal right? Unless I'm missing something.

a1369209993 914 days ago

For a arbitrary Apter Tree, a linear scan is unordered. You can impose additional constraints to get a ordered traversal (in the same way that, eg, you can sort a assoc-list/JSON-style key-value table by keys to get a key-order traversal), and the result is still a valid Apter Tree (respectively valid list of key-value pairs).

ninepoints 914 days ago

Yes but that is not what is presented (a B+ tree is not a B tree even with minor modifications) and it changes the complexity of your other update operations drastically. The thing that grates me (as someone that has written a dozen or so different tree structures) is that this one is presented as a particularly good one, and I think it excels at almost nothing, hence its obscurity.

spenczar5 914 days ago

For me, N is small. Its also N-ary, not binary, which crosses off a bunch of the first options. Anyway, I am not sure this will work, just worth trying. Empirical numbers beat theory every time :)

ninepoints 914 days ago

You are using N in a different sense than I am. Unless I'm reading the tree description incorrectly, N is the size of the tree itself, not the number of children.

spenczar5 914 days ago

Oh, I was being sloppy and mixed N into the ariness: I meant N elements, each with a variable number of children (as many as 8).

ninepoints 914 days ago

I would hazard a guess that a regular n-ary tree would outperform the OP tree in many usage scenarios with no extra effort, and with a number of B+ tree variants being strictly better at the cost of more effort.

jiggawatts 914 days ago

There are use cases where that doesn’t matter, such as some compilers where it makes a full pass over the source code for every optimisation type.

mgaunard 914 days ago

Vector programming requires you to change your way of thinking; instead of computing something for 1 element, you compute it for N elements.

ninepoints 914 days ago

I do lots of simd and shader programming, but regardless of register width, O(n) is not O(1)

mgaunard 914 days ago

The point is that you shouldn't try to get all the children of a single node, but rather all the children of all the nodes, which is still O(n) and not O(n^2).

ninepoints 914 days ago

This doesn't make any sense to me.

marginalia_nu 914 days ago

I don't think looking at asymptotic behavor makes a lot of sense in situations where n is small and bounded. Big O says nothing about such cases.

ninepoints 914 days ago

Sorry, do you not have trees for which the size of the tree is large. Do all your trees fit inside a few cache lines of storage?

marginalia_nu 914 days ago

I deal with very large tree structures (~100 GB) in my search engine, but even there the dominant factor for performance isn't big O, but reducing block reads, access patterns and keeping relevant data in the disk cache.

Big O isn't irrelevant, but it is not the full story either. There's a solid reason why hash tables are a thing in memory but aren't really a thing on disk.

ninepoints 914 days ago

Do you understand the data structure being proposed in the original post, and are you claiming that scanning 100GB of data every time you want to perform a childof operation is acceptable? Please, use the proposed tree for your application since big o isn't the full story to you lol