Hacker News new | ask | show | jobs
by majormajor 2345 days ago
I'm not familiar with Clojure, but you can hit conflicts in the Git world, though, which seem to be what the parent is concerned about. Two of us could be creating some new data based on the last data we had, at time T, and then the other person submits theirs at time T+5, and I submit mine at time T+10. In that case, my change hasn't taken theirs into account.
3 comments

If you can "submit" your change back to the original datastructure then the original datastructure is not immutable, right? Here's a nice explaination about how the persistant immutable datastructures work: https://hypirion.com/musings/understanding-persistent-vector...
I believe the question is that, if two threads take the same immutable vector, and both make a change to it independently, they'll end up with two new vectors (eg two branches in git); a vector that reflects thread1's change, and a second vector that reflects thread2's change. So now you have a conflict, which requires resolution; git has a human intervene.

eg

x = [1,2,3]

Thread1 -> x + [4] => [1,2,3,4]

Thread2 -> x + [5] => [1,2,3,5]

But you were expecting [1,2,3,4,5]

Reality was that you wanted an order to your events, normally enforced by locking, which the immutable vector doesn't seem to help you with; they were both able to update independently, but you actually wanted them to update dependently.

If you try to use immutable datastructures to avoid locking, then how is conflict resolution handled?

I think the answer would be that it doesn't help you avoid locking; either you lock & share a single reference to the latest version of your immutable vector, to enforce ordered events, or you define a resolution strategy separately. The immutability aspect just stops you from not having a resolution strategy -- which would always be incorrect

And if I understand correctly, the ideal scenario for immutable datastructures in concurrent scenarios is when you can define such a merge strategy (and safely give threads their own copy of the datastructure to muddle with, without actually having to copy the entire datastructure)

You could, as per your example, use locking as part of a resolution/merge strategy to combine the results of two separate computations running on two separate threads. Or you could use some strategy that does not involve locking. Either way, it does not support the original claim I disputed that "Immutable structures still require locking".
>Either way, it does not support the original claim I disputed that "Immutable structures still require locking".

It does, if you believe serialization by locking is the main strategy to handle serialization (in which case, mutable or immutable, you still need to lock), and so... you still need locking. Serialization being the main scenario GP gave.

Your original answer didn't resolve the problem either -- fine, you didn't need to lock when adding elements to your immutable structure, but you still haven't reached serialization; you've just pushed the problem back another step.

The answer that I believe GP would need to correct his understanding, (and much more importantly, the answer that I'm interested in :-) is what serialization strategies does immutable datastructures enable, if not locking?

The other correction GP seems to require is whether serialization is actually that important in general, and whether functional programmers tend to experience otherwise... But I don't care about that answer :-)

the answer that I'm interested in :-) is what serialization strategies does immutable datastructures enable, if not locking?

Depending on your performance goals, Compare-and-Set with retries a la clojure's atom reference construct.

https://clojure.org/reference/atoms

Oh goodness :) OP has conceded the point, but you're still down to argue on the basis of what a person may or may not believe is the main strategy to handle serialization. I give up. You win, I guess.

Regards your other question(s), I will just add that I answered many similar questions for myself (as well as disabusing myself of a lot of misconceptions) by undertaking to get a basic understanding of Haskell.

You're assuming that x+[5] mutates the list instead of returning a new list.

If you're planning to have multiple threads append items to a list, the immutable way to do it is to have each thread return a list of items to append to the main list, then fold those items into the list.

No, parent post is definitely not assuming that.
In brief, these sorts of conflicts simply do not arise in a fully persistent data structure. You may have a situation where you have a persistent data structure together with one or more mutable references, each to some version of the data structure. Yes, a modification to one of these mutable references would need to be synchronized, but they are separate from the persistent data structure itself.

Again using git as an example, there is the persistent data structure, aka the commit graph, as well as mutable references to commits, aka branches. A change to what commit a branch references needs to be synchronized.

I concede. If you have algorithm to do the `git merge` equivalent without human help, I guess no locking or STM is needed. Although that's very costly in implementation.

It's great to have git as a mental model in this discussion, really useful.

Whether the "merge" implementation is costly or complicated very much depends on exactly what it has to do. The git example is pretty much a worst case example in this regard.

An easier example could be that you have a tree structure that represents a mathematical expression. The evaluation of every node could proceed on its own lightweight thread. The merge strategy would be to simply perform the appropriate operation on the results produced by the threads evaluating the child nodes.

I don't get math expressions example at all. One thread modifies one value, so there is no (mutability) problem to solve.

You have customers' orders to buy items. One last item remains at your store. You accept one order and update HEAD. You accept another order in parallel and follow to merge. "Merge" here means that you need to return money to the customer and send out an apology e-mail.

More cumbersome than locking, isn't it? But possible, yes.

Yeah, you're right, I muddied the waters a bit with that example. I was just trying to think of examples of persistent data structures being used in a concurrent context, but you had in mind a situation where multiple agents could be updating the same data structure independently.

Your example is much better. I also was thinking of maintaining an account balance with a log (vs. synchronising updates to a stored amount), but it's not much different from your example.

And of course this "eventually consist" strategy is generally how things happen "at scale", persistent data structures or not.

That would be problematic using non-immutable datastructures as well as you might end up with incorrect order or even nonsensical data if you don't use locking.

Having used immutable data structures and concurrency in non-clojure languages, I mostly resort to something like concurrentML for concurrency. Message passing lets you solve situations like that in more elegant ways.