Hacker News new | ask | show | jobs
by tomp 1653 days ago
I just spent the last few days implementing a better version of an "persistent list" data structure (heavily modelled on Clojure's vector) for a new programming language that I'm working on.

I did a quick survey of existing implementations in multiple languages and found all of them lacking. They are either overly complex, slow, or both. Even Clojure's vector, while being simple and very performant, is only usable as a stack, not as a queue, and therefore IMO inadequate as a "generic random-access array-like data structure" (akin to Python's list, i.e. "just use it don't worry about performance").

My version is about as fast as Clojure's vector, while implementing a "deque"-like interface. It's a bit more complex, but still significantly simpler than Scala's vectors (both Scala 2 and Scala 3). Cyclops (a Java-only persistent collection library) is so slow I didn't even bother finishing the benchmarks. I also compared my code to Rust's `im` (way more complex), C++'s `immer` (stack, not deque) and `immutable.js` (slower than ClojureScript).

I'll clean up the code and post it here.

2 comments

voila.. took a bit longer, the code was "too slow"... turns out I was using the wrong branching factor!

https://github.com/tomprimozic/vector

I didn't dive too far into your repo, but it looks like you're doing something similar to Finger Trees [0].

Finger Trees as described in [1] are configurable to be several different kinds of data structures, but using them as a simple list gives you:

  amortized O(1) insert, remove, update at head and tail
  O(log n) insert, remove, update, worst case
  O(log n) concatenation of two lists
And you can traverse all N elements forward or backwards in O(1) per element.

I think it's a very nice data structure.

[0] https://en.wikipedia.org/wiki/Finger_tree

[1] https://www.staff.city.ac.uk/~ross/papers/FingerTree.pdf

I just tested the Fingertree implementation from org.functionaljava

Conclusion: unusuably slow (which kind-of makes sense, since it looks like the common implementation is a 2-3 fingertree - compared to a branching factor of 32 for Vector).

There are lots of implementation details that can make some version slower or faster.

In my own, I use 2..5 branching because a sequence of appends on the front or back will tend to create a tree with 3 pointers per node (when you're about to hit 6, you split. If you get to 1, merge with a neighbor, etc...). Three is close to e, which is optimal in some sense for some operations.

I'm not a Java guy. For my use, having 32 way branching is pretty expensive because I use reference counting for each of the nodes. Since you have a true garbage collector, you don't have that cost, and it wouldn't be difficult to make a 2-32 way implementation.

Anyways, it sounds like you're not impressed and not interested. That's fine :-)

There are several libraries that implement variations of deque for Clojure, but Clojure also allows transparent use of Java, so when necessary you can just use the Java deque, which I think is highly optimized.
Java's Deque is mutable (so, naturally, it will be much more performant). Also, I tried adding it to the benchmark but turns out that it doesn't support random access (for no good reason).