| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dgreensp 1544 days ago

These problems have lots of other well-known solutions.

For large objects: Persistent data structures with path-copying. Batching small mutations (and using mutation internally in a way that doesn’t leak out). Using the type system somehow to distinguish cases where copying is not necessary (experimental in certain new languages). Structuring your program around a “data store” that is mutable.

For the “references” problem: Explicit references, like by ID. Or mutable “atoms” as in Clojure, which are maybe sort of like the “mediators” in the article.

4 comments

tomp 1544 days ago

Persistent data structures are very hard. It's basically impossible to cleanly combine them. E.g. if you have an immutable hashtable that contains immutable lists, and you want to modify a list - how do you know if the inner list changed or not (you don't want to be creating new hashtables if the elements don't change).

They work in the small, but don't truly scale.

nlitened 1543 days ago

> you want to modify a list - how do you know if the inner list changed or not

You know for sure that inner list did _not_ change, because it is immutable.

Someone else might have a reference to an updated hashtable with an updated inner list, but the one you hold will never change—isn’t that the whole idea of immutable data structures?

tomp 1543 days ago

No, I mean "update" in the persistent sense - create a new list, with some (or none) elements changed/added/removed.

e.g. say you have a method remove_all_in_values that takes an int x and a hashmap of list of int, and returns a new hashmap of list of int, with x removed from each element of the hashmap.

Obviously you'd use something like List.filter for the inner operation, but most functional programming languages give you no indication whether the result of filter is the same list (i.e. element is not present) or a new list (i.e. some elements were removed).

So how do you know if you create a new hashtable or return the old one? One solution is complex, the other is wasteful.

codebje 1543 days ago

No language will tell you at compile time whether an arbitrary list filter operation will have an effect or not. If it can, it's because the list filter isn't arbitrary, it's probably always a no-op.

The generalisation is that you have some type parent with a way to select a type child, plus a function that modifies a child, you want to lift your child -> child function to a parent -> parent function.

Optics help a lot with this class of problem.

Because you're in the world of persistent data structures, and you're doing a modify operation, there is no avoiding the 'modify hashtable' part of 'modify an element within a hashtable'. You always create a 'new' hashtable.

If it's common that your update operation has no effect it may be worth switching to a function like child -> Modified child for Modified a = NoOp | Mutated a, so the hashtable modify function can determine if it can skip its own update. A list filter is a trivial thing to write, and one that can return NoOp if no elements are filtered is no less efficient.

nlitened 1543 days ago

I see what you mean now.

I think most persistent data structures would (for most common operations) return exactly themselves in case when no modification happened. So you can very cheaply compare for equality, and that would either resolve to simple one-op comparison of references (to see if they are equal or not), or, in the worst case, comparison of some minor amount of diverging sub-elements (since persistent data structures do extensive structural sharing by definition).

In this sense, linked list is a “bad” persistent collection for your use case, but a persistent set instead would work wonderfully — if nothing changed, you’d get the same reference to a collection as input.

I am pretty sure it either would work this way, or is very easy to optimize for this use case, in Clojure.

zelphirkalt 1543 days ago

Isn't the idea of persistent data structures, that they may be more expensive than their imperative versions, but try to reduce the expense to a minimum? So by using a persistent data structure, one should already have taken that potential additional expense into consideration. The wasteful solution should not be that much more wasteful, if we assume, that there is an OK-performing persistent data structure for the use-case. Otherwise I guess one should not choose a persistent data structure. But then you run into problems, when you want to make use of multiple cores to throw at the problem.

fulafel 1542 days ago

In FP I think composability and interchangeability of these higher order functions help, in case the supplied filter did not work like you want: You could write your own filter function variant that guarantees to return the same list if the predicate function returned true for each element.

joshlemer 1543 days ago

You can pretty efficiently do this in Clojure

    (update-in
     {:foo [{:bar ["a"], :baz ["b" "c" "d"]}]}
     [:foo 0 :baz 1]
     clojure.string/capitalize)
    
    
    ;; => {:foo [{:bar ["a"], :baz ["b" "C" "d"]}]}

tomp 1543 days ago

This only works if you know what to edit up-front.

If you're processing a tree-like structure via recursion (e.g. type-checking an AST implemented with persistent immutable structures in a compiler) it's no longer easy.

mrkeen 1543 days ago

> how do you know if the inner list changed or not

This is a vital question that virtually every programming language overlooks.

If you are using a mainstream language, you do not know if they were modified.

If you use a language which has a type system to segregate mutable operations from immutable operations, then you know if they were modified.

> They work in the small, but don't truly scale.

Go even bigger then. Work with the kind of data that won't fit in memory. Put your data on disk - you can't insert into the middle of a 1GB+ file.

nyanpasu64 1543 days ago

I can't stand the standard update-based interface of persistent data structures either. https://lib.rs/crates/im wraps persistent structures in a standard vector interface, with instant clones (only copying a pointer to refcounted data) but slower assignments (which may copy data internally if other references exist), which is effectively equivalent to CoW. A persistent update is instead written as a clone followed by in-place assignment, and you can mutate in-place (which copies if there are other references to the data being modified). The nice part is that only IndexMut detaches the underlying data from other references.

I haven't used this crate yet, but if I have a need for persistent data structures I'd definitely try this since it's more accessible.

joshlemer 1543 days ago

I think a lot of the pain in Scala/Rust etc immutable data structures comes down to static typing. In Clojure for instance you can very ergonomically do deeply nested persistent updates.

    (update-in
     {:foo [{:bar ["a"] :baz ["b" "c" "d"]}]}
     [:foo 0 :baz 1]
     clojure.string/capitalize)

    => {:foo [{:bar ["a"], :baz ["b" "C" "d"]}]}

mbrodersen 1540 days ago

In Haskell you can you Lenses to easily manipulate sub-trees, individual values, or a selection of values in tree like data structures. It is very powerful.

fulafel 1543 days ago

In Clojure the default associative data structure is a persistent map and cloning it with 0 changed nodes is basically a no-op.

UncleMeat 1544 days ago

Persistent data structures exist and work very well but they still remain way slower than alternatives that don't support persistence. This is especially bad if you want the data structure to be shared across multiple threads. Just iterating over a tree now requires taking and releasing oodles of locks. Add in the GC overhead and you've got a meaningful problem.

Yes, you aren't constructing an entirely new 30k byte array on each mutation. But you are still adding a lot of overhead to your program.

dwohnitmok 1544 days ago

> Persistent data structures exist and work very well but they still remain way slower than alternatives that don't support persistence.

Depending on your definition of "way" this is true. A lot of persistent vector-like structures can have quite good performance, but if your runtime is dominated by iteration and random updating, you will still get a speed up by using raw arrays, i.e. guaranteed contiguous blocks of memory.

> This is especially bad if you want the data structure to be shared across multiple threads. Just iterating over a tree now requires taking and releasing oodles of locks.

This is not. The whole point of immutable data structures and concurrency is that iterating over them does not require any locks because the structure won't change! So concurrent reads require absolutely no new code and certainly no locks. For concurrent writes, the usual strategy is a single CAS-like operation at the top-level to swap out the old immutable structure for a new one (e.g. Clojure atoms, Java AtomicReferences, Haskell IORefs, etc.). But that's a single thing for the whole structure, not every level.

In general concurrent update in the functional programming world is handled by CAS not locks.

UncleMeat 1543 days ago

CAS still has a lock in actual memory.

dwohnitmok 1540 days ago

Not necessarily. Depends on the underlying hardware implementation (with different answers for e.g. x86 vs ARM) since software CAS usually is implemented as a single CPU instruction.

This is also stretching the definition of "lock" quite a bit because usually CAS is used to implement lock-free and wait-free concurrency.

But regardless at most, even in hardware, it's a single top-level lock, and sometimes not even that. Not one along every level of a data structure.

tomp 1544 days ago

> This is especially bad if you want the data structure to be shared across multiple threads. Just iterating over a tree now requires taking and releasing oodles of locks.

Um, what?

Persistent data structures don't require locks at all (because they're immutable), in fact using them in a multithreaded way is 100% safe and thus much easier than their mutable (and ordinarily more efficient) counterparts.

UncleMeat 1543 days ago

Refcounted gc means you need locking on refcounts or you can lose your tree during iteration.

tomp 1543 days ago

Can you provide a list of reference-counted functional languages?

memco 1544 days ago

I really enjoyed this presentation on how roc-lang is being designed to handle optimization using opportunistic in-place mutation: https://yewtu.be/watch?v=vzfy4EKwG_Y&t=1328.

karmakaze 1544 days ago

I was thinking along the same lines. Is an "atom" in Clojure like STM where the structure/reference is changed in a highly controlled manner?

Sinidir 1543 days ago

atoms are containers that require no coordination so its a simple Compare and swap. There are Refs that require transactions and change of multiple Refs is coordinated in these transactions. WIth that Clojure has an implementation of Software Transaction Memory.