Hacker News new | ask | show | jobs
by UncleMeat 1544 days ago
Persistent data structures exist and work very well but they still remain way slower than alternatives that don't support persistence. This is especially bad if you want the data structure to be shared across multiple threads. Just iterating over a tree now requires taking and releasing oodles of locks. Add in the GC overhead and you've got a meaningful problem.

Yes, you aren't constructing an entirely new 30k byte array on each mutation. But you are still adding a lot of overhead to your program.

2 comments

> Persistent data structures exist and work very well but they still remain way slower than alternatives that don't support persistence.

Depending on your definition of "way" this is true. A lot of persistent vector-like structures can have quite good performance, but if your runtime is dominated by iteration and random updating, you will still get a speed up by using raw arrays, i.e. guaranteed contiguous blocks of memory.

> This is especially bad if you want the data structure to be shared across multiple threads. Just iterating over a tree now requires taking and releasing oodles of locks.

This is not. The whole point of immutable data structures and concurrency is that iterating over them does not require any locks because the structure won't change! So concurrent reads require absolutely no new code and certainly no locks. For concurrent writes, the usual strategy is a single CAS-like operation at the top-level to swap out the old immutable structure for a new one (e.g. Clojure atoms, Java AtomicReferences, Haskell IORefs, etc.). But that's a single thing for the whole structure, not every level.

In general concurrent update in the functional programming world is handled by CAS not locks.

CAS still has a lock in actual memory.
Not necessarily. Depends on the underlying hardware implementation (with different answers for e.g. x86 vs ARM) since software CAS usually is implemented as a single CPU instruction.

This is also stretching the definition of "lock" quite a bit because usually CAS is used to implement lock-free and wait-free concurrency.

But regardless at most, even in hardware, it's a single top-level lock, and sometimes not even that. Not one along every level of a data structure.

> This is especially bad if you want the data structure to be shared across multiple threads. Just iterating over a tree now requires taking and releasing oodles of locks.

Um, what?

Persistent data structures don't require locks at all (because they're immutable), in fact using them in a multithreaded way is 100% safe and thus much easier than their mutable (and ordinarily more efficient) counterparts.

Refcounted gc means you need locking on refcounts or you can lose your tree during iteration.
Can you provide a list of reference-counted functional languages?