| HN Mirror

Gap buffer is very good.

> You’re likely spending a huge amount of memory to store pointers between elements within the unshared chunks (8 or 16 extra bytes per element adds up in a hurry) so it may be worth investigating some “chunking” so that you only have to use pointers to link between chunks, which can be stored continuously.

Yeah, they're chunked or unrolled in practice. (Unrolled Linked List was one term I've seen. Chunking is also another term).

> Alternatively, consider whether the index representation used by suffix arrays will work for your use-case: https://en.wikipedia.org/wiki/Suffix_array

Haven't heard of these before. But I'll look into it.

EDIT: Its... kind of a complicated situation I'm in. I'm basically searching a game tree, and building a "linked list" for how the children relate to their parent. I need "breadcrumbs" to relate any position back to the root.

Not necessarily because I need the root, but because the root contains information that's relevant to all of its children. So a linked list of (Grandchild -> child -> root) is very natural for this application. I originally had an array and just copied the data to an array (for "more locality") to all the children. But this uses way more space.

And the depth of this tree is ~12+, exponentially increasing width as usual. You really don't want to copy the root's data to all of the children unnecessarily: even on a GPU, its far better to just pass a pointer and connect the children-to-parent relationships, and traverse the linked list.

Hooking "child.parent = parent" is very easy. And traversing the while(node != root) node = node.parent; loop is also easy as cake. There's only ~12 of these linked-list operations in practice (because the depth of the search tree only goes to ~12 deep or so), across literally billions of possibilities. The billions of possibilities lead to the ability to process the tree in parallel (billions of possibilities to search with "only" 16384 hardware threads suddenly makes the GPU seem small!)

Despite the linked-list in the algorithm, its clear to me that the GPU is the ideal architecture for processing all of these nodes in parallel.