Hacker News new | ask | show | jobs
by carljv 3160 days ago
I love this talk, but it does throw out a lot of complicated ideas somewhat loosely, so I get why reactions and interpretations are all over the place.

I think a good companion to understanding this better is Rich's talk on "The Language of the System" (https://www.youtube.com/watch?v=ROor6_NGIWU&t=2810s).

I interpret his overarching thesis as this:

Most "situated" programs are in the business of processing information according to complex and changing rules, in cooperation with other programs (i.e., a system). Many languages, though, are overly-concerned with how they represent data internally: classes, algebraic types, etc. This "parochialism", he calls it and "concretion" about how data aggregates should work, make them hard to adapt when the rules, data, or other parts of the system change, and make it hard for their programs to work in systems. At some point your Java class or Haskell ADT has to communicate its data to other programs that have no notion of these things. So you end up w/ a ton of surface area in your code with mappers and marshallers totally unrelated to the content of the data and purpose of the program.

The idea behind Clojure is to provide easy access to simple, safe, and relatively universal structures for holding data, and a library of functions to manipulate those structures. Its "big picture" design bits are about providing semantics for multiple "programs" (from threads to services) in a system to operate on data robustly and reasonably (concurrency semantics, time and identity models, pervasive unadorned data, etc.) At some point you're going to be sending this program's data over a wire to another program, and things like "a map of strings and numbers" is pretty straightforward to transport, while a sum type implementing functor with a record data constructor that contains a Maybe SSN is not. It overly couples the underlying data to the language's representation.

The plus side of doing this is that the language can check internal consistency for you. The downside is that you're carrying a lot of baggage that you can't take with you over the wire anyway. Communication in systems is also why Rich thinks making "names" for data first class is important. Existing strongly typed languages can sort of accommodate this, but don't really privilege names.

So I think a lot of strong typing advocates are upset because they think Rich is saying types don't have value within programs. I don't think that's right. I think he's saying they have very limited value in open systems, which makes their costs often overwhelm their benefits in the individual programs within those systems.

In general, I feel like the debate has been about examining Rich's claims in the context of programs (is Maybe String good or bad, etc.), whereas he's really interested in what works in systems. I think that's indicated by his focus on the term "parochialism" which I have not seen a lot of folks address.

2 comments

> Most "situated" programs are in the business of processing information according to complex and changing rules, in cooperation with other programs (i.e., a system). Many languages, though, are overly-concerned with how they represent data internally: classes, algebraic types, etc. This "parochialism", he calls it and "concretion" about how data aggregates should work, make them hard to adapt when processing rules change, and make it hard for their programs to work in systems.

It doesn't if the systems are well-designed sytems, comprised of loosely-coupled components, something like you'd get if you used 1970s structured analysis and then actually modeled the implementation closely on the DFD with communication over a message bus with a fairly neutral messaging format.

When you start tightly coupling components (e.g., by using a messaging format tightly bound to an internal representation), using ad-hoc component-to-component integration rather than a common message bus that is abstracted from the individual components, and generally do the system engineering badly, then you have a whole pile of problems, some of which are exacerbated (but not caused) by static typing, sure.

By static typing is not the problem here.

I think type systems that try to "close" aggregates (i.e. saying "an Employee is these fields that have these types and no more") kind of do contribute to the problem. I sort of agree that static types are not "causing" such problems. They don't "cause" bad design, but they tend to make it too easy to set bad designs in stone (and most designs are bad in some way). It's not so much about types causing problems or being bad, but having costs, and thinking hard about those costs vs. the benefits. Different folks will, and should, make conclusions for their problems.

I read him as exaggerating his critique a bit because types are often oversold (static type people can be really dismissive of dynamic languages). But I think he's mostly making a "no silver bullet" kind of argument.

> I think type systems that try to "close" aggregates (i.e. saying "an Employee is these fields that have these types and no more") kind of do contribute to the problem. I

They are part of the problem if such types are shared among components; perhaps because of a design in which messages or data transfer object types are tightly coupled with the working representations in components.

But that's an unnecessary form of coupling.

Agreed that's bad. But then if the transit/messaging/persistence components of your systems are independent of the type system (good), to really use your type system you have to do work pushing things in and out of it, in return for type safety (and sometimes not much) that only lasts until the border of your program. It's really easy to over-engineer your types because you want really want to pin down the representation of your problem in the idioms the language gives you. ORM (ab)use is a good example of this, I think.

I've often made the mistake myself of architecting a too-clever type or class system for my problem, and then been faced with writing tons of crap to wrestle it in and out ofprotobufs, etc. that needed to be more general than my problem. When my program was running, it was like, woo, I made some illegal states unrepresentable, which felt great! But I could almost never do that in a way that didn't quickly reveal itself as too brittle.

I like types (mostly). I wish gradual/partial typing was a better solved problem. Clojure's goal is to make it so that you don't over-engineer and tangle up your systems by passing around simple immutable data. If you keep your system nicely decoupled, the types, which are good at finding when I've forgotten a coupling in my code, seem less valuable to me.

> It doesn't if the systems are well-designed sytems, comprised of loosely-coupled components

I've been going down a similar line of thought. But I went the other direction. That perhaps in "poorly designed" systems where there is lots of coupling, static typing at least gives you the "maintenance" benefit that is one of the bigger justifications that the static type apologist tends to give. You actually hear this a lot: "in large systems, static typing is a must..."

So it's interesting to me that you're going the other way and saying that actually in big, messy systems that static types may hurt you. That's not a common position.

When I walk through some of the big problems in "poorly designed" systems it almost always comes down to coupling: I can't touch one part of the system without having an effect on other parts of the system.

Interestingly, Rich Hickey criticizes the common static typer's idioms (like pattern matching and ADTs) as coupling. And he's right. What always surprises me though is that the static typer doesn't disagree -- they look at this coupling as a feature! They usually say something along the lines of "I choose static typing because if I change my Person class, then the compiler reminds me of all the places in my code that I need to go fix." What's remarkable about this is that it's not a reminder...it's an obligation that your choices plus the compiler are burdening you with: you must go update all those places in the code. This is the very definition of coupling.

There is a way to architect code such that you don't have to revisit 100 places in your architecture when some new data model decision is made/discovered. There is a way to build systems wherein you only have to touch one place in your code when some new feature or data information is needed.

> So it's interesting to me that you're going the other way and saying that actually in big, messy systems that static types may hurt you.

In an overly-coupled late system, static typing increases the potential effect of excessive coupling in forcing changes to remote parts of the system when making what seems to be a point change. But that effect, while magnified by static typing,, is a product of coupling.

And static typing in that situation, OTOH, mitigates (as to out note static proponents are quick to point out) the chance of missing a change that will produce incorrect behavior.

On the gripping hand, reducing the excessive coupling gets to the root of the problem, while static v. dynamic is just choosing how to allocate pain that could be avoided with better architecture.

But languages are sexier than architecture.

> They usually say something along the lines of "I choose static typing because if I change my Person class, then the compiler reminds me of all the places in my code that I need to go fix." What's remarkable about this is that it's not a reminder...it's an obligation that your choices plus the compiler are burdening you with: you must go update all those places in the code. This is the very definition of coupling.

> There is a way to architect code such that you don't have to revisit 100 places in your architecture when some new data model decision is made/discovered. There is a way to build systems wherein you only have to touch one place in your code when some new feature or data information is needed.

The way to avoid that has nothing to do with static or dynamic typing, though. If you change a protocol then you have to change anything that relied on the old protocol if you want your program to keep working, regardless of your language's type system; in a statically typed language it will tell you where those places are, and in a dynamically typed language it's up for you to find them. If your change doesn't break a protocol that old code relied on, then you won't have to change old code. The only changes dynamic typing "saves" you from making after you break a protocol are bugfixes.

If your code is tightly coupled so that changes ripple through the entire codebase, using a language that doesn't tell you where those changes have to ripple for things to keep working won't solve that.

I don't see the difference between sending a Person instance and sending a map of keywords about a person. The coupling is the same.
If my function only needs to know the "age", then why am I having to fill out my Person class with all the other stuff? Why, if I have facts about a Cat in hand, must I coerce it to a Person? These are hoops you're typically jumping through when you're dealing in ADTs.
If "age" is an important property in your system shared among different kinds of entities then you need to have an Interface or a Protocol to retrieve the age of an entity.

The same way you would create a keyword in Clojure to represent the age of an entity (e.g. ':entity/age') that can be put in a map describing a person or a cat.

In both cases you minimized the interface between your modules and you have less coupling.

Not in OCaml. You can have a function like that:

let printAge object = print object.age

And it'll just check if anything passed to printAge has a field "age".

Well, in Haskell, this seems like a case where you'd want a typeclass for getting the age out of your type.

More generally, though, it seems like row-types might be a form of static typing that would fit Rich's preferred style of programming.

That doesn't seem to describe any hoops I've ever had to jump through when using Haskell. Can you give concrete examples?
I just did. Having a Person vs. Cat taxonomy. The claim is about ADTs, not Haskell. When a "name" property will do, why do we need to introduce an ADT? Why do we need to taxonomize?
> If my function only needs to know the "age", then why am I having to fill out my Person class with all the other stuff? Why, if I have facts about a Cat in hand, must I coerce it to a Person?

If your function only needs to know the age, then why would it take a Person or a Cat at all, instead of just accepting an age parameter? But assuming you have a reason, who says you do need to coerce anything or add any dummy data? You don't even have to go very niche to get that functionality, eg in Typescript:

    class Person {
      age: number
      constructor (age: number) { this.age = age }
    }

    class Cat {
      age: number
      constructor (age: number) { this.age = age }
    }

    const printNextAge = (thing: { age: number }) => {
      console.log(thing.age + 1)
    }

    // These all work
    printNextAge(new Person(12))
    printNextAge(new Cat(23))
    const someRandomObject = { age: 10, colour: 'green', weight: 'heavy' }
    printNextAge(someRandomObject)

    // These don't:

    const lady = { name: 'carol' }
    printNextAge(lady)
    // error TS2345: Argument of type '{ name: string; }' is not assignable to parameter of type '{ age: number; }'.
    //  Property 'age' is missing in type '{ name: string; }'.

    const caveman = { age: 'stone' }
    printNextAge(caveman)
    // error TS2345: Argument of type '{ age: string; }' is not assignable to parameter of type '{ age: number; }'.
    //  Types of property 'age' are incompatible.
    //    Type 'string' is not assignable to type 'number'.
Now, if the function takes a Person, then the reason you need to fill out the rest of the stuff is because it probably wants an entire Person, not just their age. The fact that the function can tell the compiler it needs an entire Person (and not a Cat) and have it ensure that it only gets valid Persons doesn't stop you from doing anything a non-buggy program should do, it just makes the language more expressive. Even in a wordier language with a less powerful type system like Java, which obviously isn't the gold standard for static typing (and where for some reason your function was still taking an object instead of just an age int and leaving it up to the caller to extract it), it's as simple as saying:

    interface Aged {
        int getAge();
    }
and adding 'implements Aged' to your Person and Cat classes.
> So it's interesting to me that you're going the other way and saying that actually in big, messy systems that static types may hurt you. That's not a common position.

I didn't interpret it that way. I interpreted it as "If you have a big, messy system you can tame it into a nice, loosely couple system by adding some types".

> At some point your ... Haskell ADT has to communicate its data to other programs that have no notion of these things. So you end up w/ a ton of surface area in your code with mappers and marshallers totally unrelated to the content of the data and purpose of the program.

No it doesn't.

> things like "a map of strings and numbers" is pretty straightforward to transport, while a sum type implementing functor with a record data constructor that contains a Maybe SSN is not.

Yes it is.

Has this guy ever heard of Generics?

This guy was a professional C++ programmer for a couple of decades so he probably came across generics.

I think it's possible you didn't catch the parts where he talks about what he wants from his data structures. There were 2 key pieces:

+ that he can transport them between environments, possibly remotely, possibly written in different programming languages

+ that parts of the system only need to know about parts of the data structure. More, that as the data structure is passed around the system, only the producer and consumer of changes to the structure are affected by the change.

I'm not aware of any static type system, with generics or otherwise, that would meet these goals. At least not post-facto and highly artificially.

Whether or not you agree with the priority he gives to these goals is, of course, a different matter.

> that he can transport them between environments, possibly remotely, possibly written in different programming languages

This is pretty easily solved in static languages with "serializable" interfaces, which can usually be automatically derived. E.g., in Rust, you can use #[derive(Serializable,Deserializable)], in OCaml, you can use [@@deriving sexp]. This also allows you to know at compile time which types can safely be serialized. In Clojure, if you have a type that contains an InputStream of some sort, it's not reasonable to serialize it. But you won't find out until runtime, when you happen to have an instance of that map with the InputStream.

I'm fairly certain the comment you are replying to isn't talking about "generics" but "Generics" with a capital _G_.

This bit,

> while a sum type implementing functor with a record data constructor that contains a Maybe SSN is not

indicates Hickey was talking about Haskell's type system, where you in fact can derive a Functor for you data type by using Generics.

> + that he can transport them between environments, possibly remotely, possibly written in different programming languages

There exist tools to generate types between languages.

> only the producer and consumer of changes to the structure are affected by the change.

If your function doesn't alter the data type, it needs no info on the structure of it. Perhaps you can expand on what you meant here?

EDIT: Ehrm, why the downvotes? If you disagree with the above, explain what.

Generics can make serialization easier in Haskell, but that's not exactly the point. The point is, once your Haskell program is done with that data, it's getting tossed into a message queue or database, or whatever, that doesn't really care or have any concept of what typeclasses it implements, whether one of its constructors is an Either, etc. In open systems you don't really get to decide who consumes your data or how--your program can't communicate anything other than data to them--and so you often don't have a way of enforcing your types on eventual consumers. Haskell has strong opinions about how it thinks data should be represented and aggregated. But in large open systems, as the saying goes, "opinions are like aholes; everybody has one."

When I think about the popular tools for moving data around large open system: the message queues, key-val stores, pub-subs, etc. --- it seems to me that the idea moving and communicating about types and objects over wires has largely been a dud. Thinking RMI, OODBs, etc. It's just hard to get other people (tools, services) to care about how you've decided to organize the entities in your program. It's a lot of work, and the benefits over throwing mostly "plain" data may not be compelling enough.

Again, I keep coming back to his term "parochialism" and why he's focused on it. I think it's an under-appreciated point amongst all the language wars.

I feel like this thread[0] in the discussion sorta delves into that. It's certainly an area with differing opinions and I can see why some might prefer having it simply be strings the whole way down, but at some point you are going to need to interact with the values you have, and at that point you need to know what type you are dealing with, so I really feel the serialisation argument is a bit weird. In databases you also have types on everything, albeit often less powerful. If not caring about types is really what you want, nothing stops you from treating everything as a String in Haskell. Heck, you can even do dynamic programming in Haskell with Data.Dynamic and Data.Typeable if you wanted to, but that sorta defeats the whole point of a nice and powerful type system.

I think it's kinda ironic for you to bring up "parochialism" or narrow-mindedness when that is exactly what I was thinking throughout Hickeys talk.

[0] https://www.reddit.com/r/haskell/comments/792nl4/clojure_vs_...

> In open systems you don't really get to decide who consumes your data or how--your program can't communicate anything other than data to them--and so you often don't have a way of enforcing your types on eventual consumers. Haskell has strong opinions about how it thinks data should be represented and aggregated. But in large open systems, as the saying goes, "opinions are like aholes; everybody has one."

True, and edn is equally as parochial as any Haskell serialization format. I don't see how Hickey can claim primacy here.

> you often don't have a way of enforcing your types on eventual consumers

A type isn't something you enforce on consumers. It's something you enforce on yourself to help shape your code.

Regardless of how you put something onto the wire you're giving it a specific format that your consumers need to know about. This is the same whether it was serialised from Haskell or Clojure or Coq or assembly.