Hacker News new | ask | show | jobs
by tlack 912 days ago
Always surprising to click through a link on HN and discover it is one's own work. For a time I was very interested in lightweight array-based implementations of common data structures and this one seemed particularly handy.
2 comments

It sounds a little like it didn’t work out as well as you hoped. How did it fare?

I am interested because I have some scientific software (astrodynamics - propagating orbits in space) that would really benefit from a cache-friendly tree, and this seemed promising.

It does? Feels like an O(n) scan every time you need to query the children of a node is a nonstarter for most applications (the readme is strangely optimistic about this point). Plenty of cache friendly tree implementations out there, and this one seems actually cache hostile with few benefits in my mind aside from ease of implementation.

Also, I write a lot of code that runs on a gpu, and the claim that this tree is somehow gpu friendly I find particularly dubious.

> Feels like an O(n) scan every time you need to query the children of a node is a nonstarter for most applications (the readme is strangely optimistic about this point).

The trick is that querying the children of N different nodes is usually still a single O(N) scan, so if you operate on it array-style (which APL heavily encourages anyway), it's amortized constant time. Of course that's not always viable, but APL programmers tend to be surprisingly good at making array operations out of things you wouldn't expect to use arrays for.

> cache hostile

If you additionally enforce that all parent indexes point to lower numbers, a preorder traversal is a linear scan forward, and a postorder traversal with child order reversed (which you can usually correct for one way or another) is a linear scan backward.

(This assumes you only need dependency ordering, ie the parent node uses or supplies data to/from its children; if you need a true sequential traversal, the array has to be sorted according to that traversal (but is still a valid Apter Tree).)

> the claim that this tree is somehow gpu friendly I find particularly dubious

Yeah, array programming is generally kind of hit-or-miss at that and this does look like a miss.

Whenever somebody says cache friendly without additional context I assume they got code from somebody else without understanding what makes things "cache-friendly".
What?

I mean, there are specific sizes that will fit in the L-(N) cache.

And consequently, sizes that don't suffer from the false-sharing problem in concurrent scenarios due to this.

https://en.cppreference.com/w/cpp/thread/hardware_destructiv...

There's an entire field of study devoted to cache-friendly/cache-oblivious data structures:

- https://www.microsoft.com/en-us/research/wp-content/uploads/...

- https://rcoh.me/posts/cache-oblivious-datastructures/

The linear scan you are talking about I don't think gives you any sort of ordered traversal right? Unless I'm missing something.
For a arbitrary Apter Tree, a linear scan is unordered. You can impose additional constraints to get a ordered traversal (in the same way that, eg, you can sort a assoc-list/JSON-style key-value table by keys to get a key-order traversal), and the result is still a valid Apter Tree (respectively valid list of key-value pairs).
Yes but that is not what is presented (a B+ tree is not a B tree even with minor modifications) and it changes the complexity of your other update operations drastically. The thing that grates me (as someone that has written a dozen or so different tree structures) is that this one is presented as a particularly good one, and I think it excels at almost nothing, hence its obscurity.
For me, N is small. Its also N-ary, not binary, which crosses off a bunch of the first options. Anyway, I am not sure this will work, just worth trying. Empirical numbers beat theory every time :)
You are using N in a different sense than I am. Unless I'm reading the tree description incorrectly, N is the size of the tree itself, not the number of children.
Oh, I was being sloppy and mixed N into the ariness: I meant N elements, each with a variable number of children (as many as 8).
I would hazard a guess that a regular n-ary tree would outperform the OP tree in many usage scenarios with no extra effort, and with a number of B+ tree variants being strictly better at the cost of more effort.
There are use cases where that doesn’t matter, such as some compilers where it makes a full pass over the source code for every optimisation type.
Vector programming requires you to change your way of thinking; instead of computing something for 1 element, you compute it for N elements.
I do lots of simd and shader programming, but regardless of register width, O(n) is not O(1)
The point is that you shouldn't try to get all the children of a single node, but rather all the children of all the nodes, which is still O(n) and not O(n^2).
This doesn't make any sense to me.
I don't think looking at asymptotic behavor makes a lot of sense in situations where n is small and bounded. Big O says nothing about such cases.
Sorry, do you not have trees for which the size of the tree is large. Do all your trees fit inside a few cache lines of storage?
I still like that setup for using trees in low level languages.

But personally I’ve been working at higher levels of the stack the last few years, where these kinds of decisions seem less important.

And on another level, it seems like coders in general aren’t that interested in vector oriented languages and techniques which makes their study somewhat isolating.

"Isolating" is where the performance (innovation) is.

I used a very similar setup, first time I needed to implement a tree. Now, I'm a fan of Eytzinger layout. (referenced in a previous comment in this thread)

Yeah, most coders in general don't seem to be as interested in this stuff, but it's still necessary. They'll want more performance.

Why do they need more performance? Hardware gets faster all the time, and the most popular implementations of the most popular programming languages have so much low-hanging fruit you can get a 10x improvement by rolling your cat on the keyboard.

I don't think programmers actually care about performance as much as they care for convenience. Every year the stack moves a bit higher, and everyone is okay with websites taking days to load on brand new phones with Gigabit wireless connections. There are companies that care about performance on the margin, like stock trading firms, but to get into one of them, you have to get pretty lucky, or come from a pretty special background. Even the banks are using Python more and more, these days.

Shrug.

People will always care about performance because they will always want more functionality.

I bet you care about the performance of your database or how fast your pages load. You want your streaming videos to be skip-free and your phone calls to sound lifelike. Performance will always matter. Because while people like me will be always trying to squeeze more, "the other side" will always be ready to use it for some new feature.

The post you're replying to sounds more like it's lamenting the complacency of most developers more than it's saying that performance isn't worth caring about, or I'm projecting my own feelings onto it.

I'm not under the impression that people care at all, to be honest, outside of certain communities. If you're not in those communities talking about performance as a corner stone feels mostly like screaming into the void.

It is always like that when you venture off the well trodden path. I am studying low latency emulation and it's also isolating.
> working at higher levels of the stack the last few years, where these kinds of decisions seem less important.

but then accumulated outcome of this is the slowness you see in web software!

FWIW I worked also on scientific software (phylogenetics, which is all about biological evolutionary trees) and the tree structure is like Atree (https://github.com/RuneBlaze/internode/blob/main/src/tree.rs...).

It does help (roughly ~5x vs. pointer-chasing trees, probably can be further optimized) for my workload, but at the same time quite some time was spent just making sure the tree is correct.

I'm interested in the same stuff/area. There's a lot of great results to read, check out cache-oblivious B-trees by Dr. Demaine (or honestly anything by him.) This link is a good distillation of some of the concepts. https://algorithmica.org/en/eytzinger

I'm _also_ interested in scientific software, but that's more a hobby than a job. =)

For propagating a large number of orbits in space, I'm really curious what the Correct (tm) numerical algorithm is, mind sharing more? I love that space right at the intersection of lots of math + need to understand how computers really work. Once upon a time I implemented fast multipole method for the n-body problem, but I certainly don't claim to deeply understand it, anymore. :)

Friendly heads up, "pseudo" is frequently misspelled "psuedo" in the readme.