| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tomp 1545 days ago
	Persistent data structures are very hard. It's basically impossible to cleanly combine them. E.g. if you have an immutable hashtable that contains immutable lists, and you want to modify a list - how do you know if the inner list changed or not (you don't want to be creating new hashtables if the elements don't change). They work in the small, but don't truly scale.

4 comments

nlitened 1545 days ago

> you want to modify a list - how do you know if the inner list changed or not

You know for sure that inner list did _not_ change, because it is immutable.

Someone else might have a reference to an updated hashtable with an updated inner list, but the one you hold will never change—isn’t that the whole idea of immutable data structures?

link

tomp 1545 days ago

No, I mean "update" in the persistent sense - create a new list, with some (or none) elements changed/added/removed.

e.g. say you have a method remove_all_in_values that takes an int x and a hashmap of list of int, and returns a new hashmap of list of int, with x removed from each element of the hashmap.

Obviously you'd use something like List.filter for the inner operation, but most functional programming languages give you no indication whether the result of filter is the same list (i.e. element is not present) or a new list (i.e. some elements were removed).

So how do you know if you create a new hashtable or return the old one? One solution is complex, the other is wasteful.

link

codebje 1545 days ago

No language will tell you at compile time whether an arbitrary list filter operation will have an effect or not. If it can, it's because the list filter isn't arbitrary, it's probably always a no-op.

The generalisation is that you have some type parent with a way to select a type child, plus a function that modifies a child, you want to lift your child -> child function to a parent -> parent function.

Optics help a lot with this class of problem.

Because you're in the world of persistent data structures, and you're doing a modify operation, there is no avoiding the 'modify hashtable' part of 'modify an element within a hashtable'. You always create a 'new' hashtable.

If it's common that your update operation has no effect it may be worth switching to a function like child -> Modified child for Modified a = NoOp | Mutated a, so the hashtable modify function can determine if it can skip its own update. A list filter is a trivial thing to write, and one that can return NoOp if no elements are filtered is no less efficient.

link

nlitened 1544 days ago

I see what you mean now.

I think most persistent data structures would (for most common operations) return exactly themselves in case when no modification happened. So you can very cheaply compare for equality, and that would either resolve to simple one-op comparison of references (to see if they are equal or not), or, in the worst case, comparison of some minor amount of diverging sub-elements (since persistent data structures do extensive structural sharing by definition).

In this sense, linked list is a “bad” persistent collection for your use case, but a persistent set instead would work wonderfully — if nothing changed, you’d get the same reference to a collection as input.

I am pretty sure it either would work this way, or is very easy to optimize for this use case, in Clojure.

link

zelphirkalt 1545 days ago

Isn't the idea of persistent data structures, that they may be more expensive than their imperative versions, but try to reduce the expense to a minimum? So by using a persistent data structure, one should already have taken that potential additional expense into consideration. The wasteful solution should not be that much more wasteful, if we assume, that there is an OK-performing persistent data structure for the use-case. Otherwise I guess one should not choose a persistent data structure. But then you run into problems, when you want to make use of multiple cores to throw at the problem.

link

fulafel 1543 days ago

In FP I think composability and interchangeability of these higher order functions help, in case the supplied filter did not work like you want: You could write your own filter function variant that guarantees to return the same list if the predicate function returned true for each element.

link

joshlemer 1545 days ago

You can pretty efficiently do this in Clojure

    (update-in
     {:foo [{:bar ["a"], :baz ["b" "c" "d"]}]}
     [:foo 0 :baz 1]
     clojure.string/capitalize)
    
    
    ;; => {:foo [{:bar ["a"], :baz ["b" "C" "d"]}]}

link

tomp 1544 days ago

This only works if you know what to edit up-front.

If you're processing a tree-like structure via recursion (e.g. type-checking an AST implemented with persistent immutable structures in a compiler) it's no longer easy.

link

mrkeen 1545 days ago

> how do you know if the inner list changed or not

This is a vital question that virtually every programming language overlooks.

If you are using a mainstream language, you do not know if they were modified.

If you use a language which has a type system to segregate mutable operations from immutable operations, then you know if they were modified.

> They work in the small, but don't truly scale.

Go even bigger then. Work with the kind of data that won't fit in memory. Put your data on disk - you can't insert into the middle of a 1GB+ file.

link

nyanpasu64 1545 days ago

I can't stand the standard update-based interface of persistent data structures either. https://lib.rs/crates/im wraps persistent structures in a standard vector interface, with instant clones (only copying a pointer to refcounted data) but slower assignments (which may copy data internally if other references exist), which is effectively equivalent to CoW. A persistent update is instead written as a clone followed by in-place assignment, and you can mutate in-place (which copies if there are other references to the data being modified). The nice part is that only IndexMut detaches the underlying data from other references.

I haven't used this crate yet, but if I have a need for persistent data structures I'd definitely try this since it's more accessible.

link

joshlemer 1545 days ago

I think a lot of the pain in Scala/Rust etc immutable data structures comes down to static typing. In Clojure for instance you can very ergonomically do deeply nested persistent updates.

    (update-in
     {:foo [{:bar ["a"] :baz ["b" "c" "d"]}]}
     [:foo 0 :baz 1]
     clojure.string/capitalize)

    => {:foo [{:bar ["a"], :baz ["b" "C" "d"]}]}

link

mbrodersen 1542 days ago

In Haskell you can you Lenses to easily manipulate sub-trees, individual values, or a selection of values in tree like data structures. It is very powerful.

link

fulafel 1544 days ago

In Clojure the default associative data structure is a persistent map and cloning it with 0 changed nodes is basically a no-op.

link