Hacker News new | ask | show | jobs
by kccqzy 1212 days ago
There is no essential mutable state in computer science. All essential mutable state can be modeled away to use no mutable state, as you have shown in the generation number idea which is one valid way. (I'm strictly talking about computer science problems not computer engineering problems such as drivers.)

The generation number idea you have shown is an excellent idea. It immediately enables several new capabilities compared with just mutation:

* You are now able to determine which data is older and which is newer without resorting to an external clock.

* If your mutation takes a long time, there is no longer a need to use a long-held mutex to ensure thread safety which would prevent concurrent reads; you can create a new generation separately and then atomically CAS it. Writers don't block readers.

* If you suddenly need undo/redo functionality, it's suddenly trivial. Or diff'ing between versions. Or understanding the reason for changes (basics of auditing).

4 comments

> All essential mutable state can be modeled away to use no mutable state

However, that modeling introduces complexity of its own that not everyone agrees is entirely essential.

Engineering is about tradeoffs. There is no One Model To Rule Them All. Your post is a great thing to keep in mind, but it's not a prescription. Engineers need to be able to look at a problem from many points of views at once, and try to find the right path for their current problem. This is why it's so important that models play nicely with one another, something that functional programming is getting better at, but reactive programming still really struggles with.

All of your asterisked points are well-taken, but: do you need that capability? Sometimes; sometimes not.

> There is no essential mutable state in computer science.

Yes, theoretically. Now imagine your mutable state is 2GB in size, have fun creating a copy of it on every change.

You are not getting the point of my comment or the original article. It's clear to me that the kind of complexity being talked about in the article is cognitive complexity not computational complexity. It's not about choosing between an O(n) copying of data and O(1) in-place mutation; it's about choosing code that's easy to comprehend and maintain.
Their point is that you can't choose the cognitively simpler choice because of real world design constraints, such as not consuming all available RAM — i.e., because of essential complexity.
that's complexity introduced by your approach to the solution (using an inadequately large computer) rather than essential to the problem you're trying to solve
I think the crux is which side gets to be called "essential". Is it the abstract problem the code is intended for, or is it the real world we're living in. Mike Acton argues for the latter (https://www.youtube.com/watch?v=rX0ItVEVjHc)

And all this is heat rather than light because there is absolutely a significant fraction of concerns that both sides would agree are utterly incidental/accidental complexity.

i think that's a purely semantic or terminological debate, so it isn't important which part we declare 'essential' and which part we declare 'accidental', and arguments that one definition or the other is better are pointless

what is important is that we understand which definitions people were using in particular utterances, that our definitions have adequate ontological coherence (avoiding "eargrayish" reasoning errors), and that other people can understand which definitions we are using

in brooks's 01986 paper he clearly considers questions like copying 2 gigabytes of data 'accidental' http://worrydream.com/refs/Brooks-NoSilverBullet.pdf

> All software construction involves essential tasks, the fashioning of the complex conceptual structures that compose the abstract software entity, and accidental tasks, the representation of these abstract entities in programming languages and the mapping of these onto machine languages within space and speed constraints. Most of the big past gains in software productivity have come from removing artificial barriers that have made the accidental tasks inordinately hard, such as severe hardware constraints, awkward programming languages, lack of machine time. ... The essence of a software entity is a construct of interlocking concepts: data sets, relationships among data items, algorithms, and invocations of functions. This essence is abstract, in that the conceptual construct is the same under many different representations. It is nonetheless highly precise and richly detailed. ...

so it isn't important what mike acton thinks the terms should mean unless we're trying to understand how he uses them; brooks, moseley, and marks lump space and speed constraints into the category of 'accidental' rather than 'essential', and their arguments need to be interpreted using the definitions they intended to use, not conflicting definitions invented by a youtuber decades later

similarly, people commenting on moseley and marks's paper should be assumed to be using the same definitions as moseley and marks, not conflicting definitions, unless they explicitly state otherwise

Immutable data structures are often implemented as persistant data structures, which have the traits of immutability but does not "copy everything on every change" https://en.wikipedia.org/wiki/Persistent_data_structure
Btrfs does stuff like instantaneous snapshots for a multi terabyte size filesystem. I have no idea how they do it, but apparently it’s 100% possible.
Well since it's claiming to not mutate state, it must be simple! Go read the code and tell us what you learn.

In the event you find the source code TLDR, indeed it seems implementing an immutable api is complex.

The optimizer removes that.
To see immutability on huge state in action see git.

That is essentially what must be going on under the hood to model mutable state. Every mutation made to a variable is a commit. The only commits that are saved are the ones that have existing references.

Now you're thinking it could be a big memory leak. But you pair this with reference counting you can eliminate the leak.

Essentially when the reference of the earliest commit goes out of scope, the next earliest commit is checked out as the initial root commit and the current commit is freed.

The memory model is then like a moving git window/linked list. Mutations are committed at the tail end while memory is freed at the head.

Lambda calculus machines can't exist in reality though. Only the von Neumann machine can be actualized.

Thus from a practical perspective the foundations of actual computing must be built on mutable state.

You are wrong about the generations idea. It's a side effect. Functions themselves cannot actually do anything with a generation number. It's completely useless other then for letting you know how many times a value has changed.

A generation number also doesn't negate the need for mutexes. The system is still operating on shared state, a generation number doesn't change this.

You are not getting the point of either my comment or the original article. You are thinking from an engineering point of view, where if you look through all the layers of abstraction you see mutexes and shared mutable memory. That's not my point; my point is about building up those abstractions so they are sufficiently hidden. Functional programmers routinely operate on a higher level of abstraction, and they have shorter, more maintainable code. In such code, it would be a code smell if you continue to use mutexes, because that should've been an implementation detail. It should've been clear from the outset that the kind of complexity the article is talking about is cognitive complexity.
I am getting it. I am simply saying that you can't completely hide those details. They leak through.