Hacker News new | ask | show | jobs
by Lavinski 2074 days ago
The article goes through a series of examples to show motivations for the following:

1. Variables should not be allowed to change their type.

2. Objects containing the same values should be equal by default.

3. Comparing objects of different types is a compile-time error.

4. Objects must always be initialized to a valid state. Not doing so is a compile-time error.

5. Once created, objects and collections must be immutable.

6. No nulls allowed.

7. Missing data or errors must be made explicit in the function signature.

The idea being that each feature or constraint _enables_ you to reason and predict more about a program than you could otherwise.

I encourage anyone interested in these ideas to play around with F# or a similar language and get a feeling for how they influence your code. If you've mastered one paradigm such as OO one of the best ways to find holes in your mental models is to try and find another point of view to look at the same problems. Even if you keep writing most of your code like you do today, in the language you do today, it can still be beneficial.

3 comments

5. Once created, objects and collections must be immutable.

So this language would not be general purpose, as it would not be suitable for high-performance computing.

Large scale simulations almost always involve arrays that are modified in place. Being able to somehow declare a collection to be immutable would be highly useful, but not having the option of mutable collections limits the kinds of problems that can be approached with the language.

I'm not going to claim that mutability is never useful for performance, but many large scale simulations can be expressed quite elegantly using bulk operations on arrays or other structures, with no mutability in sight. Both particle simulations a la n-body and stencil operations are in this category. An efficient low-level implementation of such bulk operations involves mutable updates, just like any functional language is compiled to "impure" assembly code, but the programming model used for application programming can remain pure.
Interesting. Can you explain, with a somewhat simple example, how this can be efficiently implemented, or at all? I mean preserving the appearance of immutability at the source language level, while mutating the original structure under the hood for performance.
Any vectorised operation in Numpy is an example of this. The pure subset of Numpy can be used to write useful programs, but the Numpy functions/methods are mostly implemented in impure C.

Another example is completely pure array programming such as in Accelerate[0] or Futhark[1].

[0]: http://www.acceleratehs.org/

[1]: https://futhark-lang.org

A somewhat related idea is called "benign effects." The idea is that you write code with an immutable interface that uses mutation in its implementation.

So there are "effects" (non-functional state changes) that are encapsulated ("benign").

I learned this term in reference to Standard ML at CMU.

This is different from what you're asking because it isn't a compiler optimization and it isn't actually checked by the language at all, but it works pretty well in practice.

It's like unsafe in Rust: you write most of your code assuming a useful property that you then break in the small percentage of code that needs to break it.

Not very knowledgable on this myself, unfortunately, but I believe that in graphics programming, shaders written in GLSL often take the form of a series of functional, mathematical transformations of vertices. Those transforms are run in the GPU as highly parallelized array operations, probably using a lot of mutable state. But those details are mostly hidden from the shader programmer.
C++ supports this via the mutable keyword https://stackoverflow.com/questions/105014/does-the-mutable-... though not particularly for performance purposes.
Thanks to all who replied.
I concur, I should have been more precise in my comment.
Isn’t it possible (at least in theory) to make mutability an implementation detail of the compiler/runtime? Rust’s borrow checker approaches this, but the abstraction leaky or nonexistent. Additionally, many high performance computing applications (e.g. Tensorflow) abstract away expensive mutable operations, so at least in theory, it should be possible to isolate mutability to small segments of code where mutability is opt-in.
Yes, Haskell as a pure functional language does this too. A naive copy-by-value handling of lists will usually end up in the same order of magnitude for performance as mutate-in-place linked lists in C. The compiler can track those immutable values and just mutate them in place, when it can guarantee that's a safe operation. The vast majority of the time, you can get away with just copying a pointer or renaming, not the whole variable.

The caveat is that, in my experience, it's a fair bit harder to reason about performance, as the execution model is even more abstracted away from the hardware than even something like the C model is (which is no longer a good fit either, in this era of speculative execution and multi-level caches.)

> The caveat is that, in my experience, it's a fair bit harder to reason about performance, as the execution model is even more abstracted away from the hardware than even something like the C model is (which is no longer a good fit either, in this era of speculative execution and multi-level caches.)

One solution is to have a tool developed and distributed along with the compiler (so it can never fall out of sync with the compiler, that's why) annotate the code with notes about performance.

I think if performance is part of the requirements of your code, then performance must be a part of your type signature.

For example, a tail-recursive function needs to have it’s type as tail-recursive.

This is where linear types and in general quantitative type theory comes into play. Also eagerness / laziness annotations.

Tail recursion is not necessary to annotate imo, but I guess the compiler/linter could maybe complain if it finds recursion it can't do a tail call optimisation for. These kinds of warnings are similar to mutable languages warning about things that are probably bad but sometimes necessary.

Rust's im[1] and rpds[2] crates are refcounted pointers to immutable data structures, but support mutable operations on &mut instances. When an instance is cloned, it merely creates another pointer. When an instance is modified, it uses Arc::make_mut() to only clone each tree node if it has other users. This approach has runtime overhead, but makes nested updates (foo[0][0].attr = 1) as simple as mutable structures.

This somewhat resembles immer.js (uses a proxy around an immutable structure which records updates). Contrast this approach to Clojure transients (whose children don't magically become transient), and whatever Haskell does (https://news.ycombinator.com/item?id=24740384).

[1]: https://docs.rs/im/

[2]: https://docs.rs/rpds/

Linear types fix this problem, by letting you prove to the compiler that logically immutable operations can be implemented as in-place updates.
Mutability is an abstraction, it doesn't forbid in place modification of data. What it forbids is other code accessing data that holds references the array prior to the modification, which creates a logical error.
F# and OCaml have mutable arrays.
Rust's borrow checker would prevent Example 5 from compiling, since once you add `cust` to the collection, you can't touch it anymore (unless you insert a clone or etc.). So in this case at least, the inability to reason about code can be resolved by banning mutable aliasing, without eliminating mutability.
Rust's ownership system would prevent example 5 from working (because you have to move the instance into the set).

The borrow checker is about validating that references don't outlive their target, and R^W.

Oops.
1. Variables should not be allowed to change their type.

This sounds nice, but is there a way to accomplish it without losing some expressibility or concision? Rather than looking at JS, consider low-level operations on a small chunk of memory as a niche example. Interpreting the same region as a buffer of 64-bit ints vs 16-bit uints gives entirely different behavior to the standard operators like addition, multiplication, and shifts, and there are plenty of cases where it makes sense to mix and match those operators. It's possible to construct a single type that encompasses all that behavior, but the price for doing so is a formidable wall of similar-looking method names rather than just being able to use a plus sign or other easier-to-understand constructs.

> Interpreting the same region as a buffer of 64-bit ints vs 16-bit uints

The variable doesn't change its type though, instead you change your interpretation of it. That's a very different and explicit operation & manipulation.

Although example 1 doesn't strictly have anything to do with types, you could get the same behaviour with `x = 0` in any language with closures (like javascript), or `foo = 0` in a language which passes parameters by (mutable) references as long as that information is not visible in the caller.

In descendants of ML (Haskell, OCaml, Rust etc) you can use Algebraic Data Types to condense your wall of methods to one function
Algebraic data types have runtime case information, and won't let you reinterpret the underlying bits of a binary buffer between types. I think grandparent meant they wanted pointer casts, unions, reinterpret_cast, or transmute.