Hacker News new | ask | show | jobs
by tkdc926 1990 days ago
> In Clojure, whenever you "append" to a vector (array) you get a "new" vector and the original does not change. Anyone with a reference to the original can always count on it being the same.

This has never made any sense to me. Can someone please explain why you would still want the original vector to continue to exist with data that no longer reflects the current system? What am I missing?

8 comments

To avoid race conditions and to preserve abstraction boundaries. This is the archetypical design pattern in functional languages. For a simple example, if you first check the size of a vector before accessing index `i`, you can be sure the length hasn't changed right after your size-check. It's exactly like writing code only using only immutable Strings of Java, or frozen sets of Python.

So in your example, it "continues to exist" in local variables to reflect the state of the system as it was when you read it, as long as you still hold a reference to the old vector. Typically, you'd ask your software to fetch a fresh copy of the vector any time you'd want new data. But that's explicit in code. You'll have fewer surprises if mutation (like vector-appends) are never shared between variables.

A specific example:

I have context scratch-pad hashmap object I pass into a top-level function. It can then be decorated with extra scratchpad data all the way down the call-stack and passed into lower functions, for them to make use of. So each function can pass stuff down, but it's not available further up the call stack. It effectively looks like a stack object in terms of its semantics: as you unwind the stack you unwind history, 'undoing' changes. And the stack can take many different paths over execution.

Functions can do pretty much anything they want to the object further down the stack, without affecting other functions' inputs (parents or siblings). If it were mutable, the functions would suddenly be coupled to each other, and could change each other's data inputs. Add concurrency to that and it gets worse.

There are other ways to do this with Clojure. But I like this method, it's obvious and easy to test. It also feels reminiscent of Prolog.

In my example I'm associating new values into a hashmap, not appending to a vector, but it amounts to the same thing.

Suppose I provide you with a black-box function foo that takes an int.

You can write the program

   x = read_int_from_terminal();
   y = foo(x);
   println(x + ":  " + y);
And you can be confident that invoking foo has no effect on the value of x that will print out on subsequent lines. x is a local variable that refers to a stateless, immutable, mathematical object. If x refers to the number 3, it will continue to do so until you personally tell it to refer to something else.

In clojure, as in other functional programming environments, a vector is also a stateless, immutable, mathematical entity. Which is nice because nobody can change its state out from under you and that makes programs easier to reason about.

There are also specific use cases where this feature may shine in a specific way, for example making it easy to maintain an "undo history" when implementing a text editor. If the state of a buffer in your editor is an immutable value then it's easy to maintain a stack or list (or whatever) of all the states of the buffer - the top of the stack being the current state - and operations on the buffer simply create a new version but do not destroy any information about prior versions.

Outside of such specific use cases, though, it's just about referential transparency and enhanced ability to reason about the interactions between different pieces of code.

The main thing you’re missing is the (real) functional programming style, which expects this kind of behavior when manipulating data. To refer to it as a reference is somewhat misleading, functional programs are just dealing with a raw block of data, which is transformed into a new block of data. But to call it the old data and the new data is kind of silly, because it’s not really about state at all.
Immutability is very useful for dealing with concurrency. For example if a thread is iterating over a vector and an other thread mutates it you don't want the first thread to get "the rug pulled from under it" so to say. If things can never be suddenly changed, you don't have to plan for that.
when you work in a functional programming style then every modification of a objects data member would create a new copy of the object; that one is mostly a copy of the old object except for the variable change that was introduced by the setter. In general that make a lot of sense for smaller objects (in Java the String is immutable, so are tuples) - it is easier to reason about the object and you don't have race conditions.

in Scala you have mutable collections and immutable collections - like that one; the more accessible versions that you have in your default namespace are immutable (that's supposed to be the default choice).

now in theory smaller contingent objects that span 'a few' cache lines would be easier to copy than to modify. Now my problem with that statement is that with the JDK you usually have lots of Object references (can't do a lot with primitive types), so you need to try hard in order to get an object that spans a few cache lines. You would have more of these in go, but they don't do a lot of functional style programming in go, afaik.

maybe it would make some sence to port clojure or scala to the golang runtime.

In a word - concurrency.