Hacker News new | ask | show | jobs
by goostavos 637 days ago
>Once I get two or more levels of nesting, I find it far too easy to get confused about which level I'm on

Author here, I agree with you. I have the working memory of a small pigeon.

The flavor of data orientation we cover in the book leverages strongly typed representations of data (as opposed to using hash maps everywhere). So you'll always know what's shape it's in (and the compiler enforces it!). We spend a lot of time exploring the role that the type system can play in our programming and how we represent data.

1 comments

Given the strongly typed flavour of data oriented programming, I wonder if you have any thoughts on the "proliferation of types" problem. How to avoid, especially in a nominally typed language like Java, an explosion of aggregate types for every context where there may be a slight change in what fields are present, what their types are, and which ones are optional. Basically, Rich Hickey's Maybe Not talk.

    record Make(makeId, name)
    record Model(modelId, name)
    
    record Car(make, model, year)
    record Car(makeId, modelId, year)
    record Car(make, model)
    record Car(makeId, modelId)
    record Car(make, year)
    record Car(makeId, year)
    record Car(make, model, year, colour)
    record Car(makeId, modelId, year, colour)
    record Car(year, colour)
    
    ....
Hickey is great at trash-talking other languages. In the case of Car you might build a set of builders where you write

   Car.builder().make(“Buick”).model(“LeSabre”).build()
Or in a sane world code generate a bunch of constructors.

In the field of ontology (say OWL and RDF) there is a very different viewpoint about ‘Classes’ in the objects gain classes as they gain attributes. :Taylor_Swift is a :Person because she has a :birthDate, :birthPlace and such but was not initially a :Musician until she :playsInstrument, :recordedTrack, :performedConcert and such. Most languages have object systems like Java or C++ where a Person can’t start out as not a Musician but become one later like the way they can in real life.

Notably in a system like the the terrible asymmetry of where does an attribute really belong is resolved, as in real life you don’t have to say it is primary that Taylor Swift recorded the Album Fearless or that Fearless was recorded by Taylor Swift.

It’s a really fascinating question in my mind how you create a ‘meta object facility’ that puts a more powerful object system on your fingers in a language like Java or Python, for instance you can have something like

   taylorSwift.as(Musician.class)
which returns something that implements the Musician.class interface if

   taylorSwift.isA(Musician.class)
where

   TaylorSwift instanceof MetaObject.class
Well, that's what C++ templates were made for.

White your code to work on Musicians, pass Taylor Swift in.

If she's not a musician, your code won't compile.

What I am talking about is more dynamic, although meta-objects could be made more static too.

Particularly, I am not a Musician now but if I learned to play an instrument or performed at a concert I could become a Musician. This could be implemented as

   paulHoule.isA(Musician.class)                                  # false
   paulHoule.as(Musician.class).playsInstruments()                # an empty Set<Instrument>
   paulHoule.as(Musician.class).playsInstruments().add(trumpet)
   paulHoule.isA(Musician.class)                                  # now true
I really did build a very meta object facility that represented objects from this system

https://en.wikipedia.org/wiki/Meta-Object_Facility

in an RDF graph and provided an API in Python that made those objects look mostly Pythonic. Inheritance in MOF is like Java so I didn't need to use any tricks to make dynamic classes (possible in RDF) available.

This is interesting. It seems like a logic language (like Prolog) would work more naturally.
builder() .... build() Rich Hickey got something right. This is about as far from the idea behind DOP as it gets.
That's on Java, though. Many other languages such as Kotlin, Swift, etc. have better ways of dealing with this, e.g. in Kotlin

  Car(make = "Buick", model = "LeSabre")
I haven't yet had the luxury to experiment with the latest version of Java, but this is one of the reasons why I wish Java introduced named parameters the say way kotlin and scala do.

Eg:

  data class Make(makeId: String, name: String)
  data class Model(modelId: String, name: String)

  data class Car(make: Make, model: Model, year: String, ...)
Now you can go ahead and order the params whichever way you wish so long as you're explicitly naming them:

  val v1 = Car(make = myMake1, model = myModel1, year = "2023", ...)
  val v1 = Car(model = myModel1, make = myMake1, year = "2023", ...)
Once withers land, I think you could approximate this by letting your record class have a zero argument constructor which sets every field to some blank value, and then fill the fields using `with`.

  var x = new Car() with { make = "Volvo"; year = "2023"; };
If you want the Car constructor to enforce constraints, you could use this pattern in a separate Builder record:

  record Car(String make, String year) {
    Car {
      Objects.requireNonNull(make);
      Objects.requireNonNull(year);
    }

    record Builder(String make, String year) {
      Builder() {
        this(null, null);
      }
      Car build() {
        return new Car(make, year);
      }
    }
  }

  var x = new Car.Builder() with { make = "Volvo"; year = "2023"; }.build();
Obviously syntax TBD.
So much syntax to enable something that other languages have had for 10+ years. That's why I can't take the "Java is as good as Kotlin now" arguments seriously.
I think named parameters would be a great addition

For now, I use Lombok's @Builder annotation. It makes it much easier to create and copy a record, where non-assigned attributes are set to default.

Example:

   var bmw = Car.builder().make("BMW").build()
It also has a practical toBuilder() syntax that creates a copy of the original record, with some attributes changed

   var other = bmw.toBuilder().year(2024).build()
I have a long convoluted answer to this.

I love that talk (and most of Rich's stuff). I consider myself a Clojure fanboy that got converted to the dark side of strong static typing.

I think, to some degree, he actually answers that question as part of his talk (in between beating up nominal types). Optionality often pops up in place of understanding (or representing) that data has a context. If you model your program so that it has "15 maybe sheep," then... you'll have 15 "maybe sheep" you've got to deal with.

The possible combinations of all data types that could be made is very different from the subset that actually express themselves in our programs. Meaning, the actual "explosion" is fairly constrained in practice because (most) businesses can't function under combinatorial pressures. There's some stuff that matters, and some stuff that doesn't. We only have to apply typing rigor to the stuff that matters.

Where I do find type explosions tedious and annoying is not in expressing every possible combination, but in trying to express the slow accretion of information. (I think he talks about this in one of his talks, too). Invoice, then InvoiceWithCustomer, then InvoiceWithCustomerAndId, etc... the world that microservices have doomed us to representing.

I don't know a good way to model that without intersection types or something like Rows in purescript. In Java, it's a pain point for sure.

My sense is that what's needed is a generalization of the kinds of features offered by TypeScript for mapping types to new types (e.g. Partial<T>) "arithmetically".

For example I often really directly want to express is "T but minus/plus this field" with the transformations that attach or detach fields automated.

In an ideal world I would like to define what a "base" domain object is shaped like, and then express the differences from it I care about (optionalizing, adding, removing, etc).

For example, I might have a Widget that must always have an ID but when I am creating a new Widget I could just write "Widget - {.id}" rather than have to define an entire WidgetCreateDTO or some such.

> For example, I might have a Widget that must always have an ID but when I am creating a new Widget I could just write "Widget - {.id}" rather than have to define an entire WidgetCreateDTO or some such.

In this case you're preferring terseness vs a true representation of the meaning of the type. Assuming that a Widget needs an ID, having another type to express a Widget creation data makes sense, it's more verbose but it does represent the actual functioning better, you pass data that will be used to create a valid Widget in its own type (your WidgetCreationDTO), getting a Widget as a result of the action.

> Assuming that a Widget needs an ID, having another type to express a Widget creation data makes sense, it's more verbose but it does represent the actual functioning better

I agree with this logically. The problem is that the proliferation of such types for various use cases is extremely detrimental to the development process (many more places need to be updated) and it's all too easy for a change to be improperly propagated.

What you're saying is correct and appropriate I think for mature codebases with "settled" domains and projects with mature testing and QA processes that are well into maintenance over exploration/iteration. But on the way there, the overhead induced by a single domain object whose exact definition is unstable potentially proliferating a dozen types is developmentally/procedurally toxic.

To put a finer point on it: be fully explicit when rate of change is expected to be slow, but when rate of change is expected to be high favor making changes easy.

> What you're saying is correct and appropriate I think for mature codebases with "settled" domains and projects with mature testing and QA processes that are well into maintenance over exploration/iteration. But on the way there, the overhead induced by a single domain object whose exact definition is unstable potentially proliferating a dozen types is developmentally/procedurally toxic.

> To put a finer point on it: be fully explicit when rate of change is expected to be slow, but when rate of change is expected to be high favor making changes easy.

I agree with the gist of it, at the same time I've worked in many projects which did not care about defining a difference between those types of data in their beginning, and since they naturally change fast they accrued a large amount of technical debt quickly. Even more when those projects were in dynamically typed languages like Python or Ruby, relying just on test cases to do rather big refactorings to extrincate those logical parts are quite cumbersome, leading to avoidance to refactor into proper data structures afterwards.

Through experience I believe you need to strike a balance, if the project is in fluid motion you do need to care more about easiness of change until it settles but separating the actions (representation of a full fledged entity vs representation of a request/action to create the entity, etc.) is not a huge overhead given the benefits down the line (1-3 years) when the project matures. Balancing this is tricky though, and the main reason why any greenfield project requires experienced people to decide when flexibility should trump better representations or not.

Do you mean in TypeScript or in another language?

In TS the `Omit<T, K>` type can be used to remove stuff, and intersection can be used to add stuff

Hopefully your domain is sane enough that you can read nearly all the data you are going to use up front, then pass it on to your pure functions. Speaking from a Java perspective.
> Given the strongly typed flavour of data oriented programming, I wonder if you have any thoughts on the "proliferation of types" problem.

Not a problem.

You're just making your life needlessly hard and blaming Java for the problems you're creating for yourself.

This represents, coincidentally, the bulk of the problems pinned on Java.

Everywhere else the problem you described is a variant of an anti-pattern and code smell widely known as telescoping constructor pattern.

The problems caused by telescoping constructors have a bunch of known cures:

- builder pattern (Lombok supports this, by the way),

- the parameter object pattern (builder pattern's poor cousin)

- semantically-appropriate factory methods.

The whole reason behind domain models taking the center stage when developing a software project is that you build your whole project around a small set of types with the necessary and sufficient expressiveness to represent your problem domain.

Also, "explosion of aggregate types" can only become a problem if for some reason you don't introduce support for type conversion when introducing specialized types.

I have thoroughly enjoyed that Hickey talk, but I think he has a very system-oriented view/take - which is very important and shows his experience - but it is also common to have control over the whole universe for our program.

In the interconnected system view, data schemas can change without notice, and the program should be backwards and forwards compatible to a reasonable degree to avoid being brittle.

This is not a problem when we control the whole universe.

I find that Haskell-esque type systems (strongly typed with frequent use of algebraic data types to represent every possible state in _that_ universe) work better for the latter, but are not the best fit for the former, and they often have to add some escape hatches at the boundaries.

Java itself is in a weird cross of this two - it has a reasonably strong type system nowadays, but it’s also a very dynamic runtime where one can easily create their own class at runtime and load it, reflect on it, etc.

So all in all — are you making that Car as part of your universe where you control everything, and it won’t change in unexpected ways? Make a record, potentially with nullable/Optional/Maybe types for the fields, if that makes sense.

If it represents some outside data that you don’t control, then you might only care about a subset of the fields: create a record type for that subset and use a converter from e.g. json to that record type, and the converter will save you from new fields. If everything is important then your best bet is basically what Clojure/JSONObject/etc do, just have a String-keyed map.

(Note: structural types can help here, and I believe OCaml has row polymorphism?)

There's always clojure.spec.
This discussion sounds like there is confusion about the Car abstraction.

Make and model vs. makeId and modelId: Pick one. Are Make and Model referenced by Cars or not? There seems a slight risk of the Banana/Monkey/Jungle problem here, so maybe stick with ids, and then rely on functions that lookup makes and models given ids. I think it's workable either way.

As for all the optional stuff (color, year, ...): What exactly is the problem? If Cars don't always have all of these properties then it would be foolish of Car users to just do myCar.colour, for example. Test for presence of an optional property, or use something like Optional<T> (which amounts to a language supported testing for presence). Doesn't any solution work out pretty much the same? When I have had this problem, I have not done a proliferation of types (even in an inheritance hierarchy) -- that seems overly complicated and brittle.

I'm not familiar with Java. Does it have no notion of structural types at all? If it does, maybe you could wrap those fields in `Car` with `Maybe`/`Option` (I’m not sure what the equivalent is in Java) so you get something like `Car(Maybe Make, Maybe Model, Maybe Year, Maybe Colour)`?
Records are structural types. Null restricted types are in draft: https://openjdk.org/jeps/8303099
Records in Java are nominal. In fact, it is syntax sugar for a class.
yes and it is called Optional (rather than Maybe)
That one is pretty simple. You have a Car object with four fields. The types of the fields are, respectively Optional<Make>, Optional<Model>, Optional<Year>, and Optional<Colour>.

Hickey makes it sound worse than it is.

so now when you have a function that takes in a Car object, you have no idea what fields those objects might have, because it's all optional! Which means the checks for the validity of each field end up spreading out to every function.
> so now when you have a function that takes in a Car object, you have no idea what fields those objects might have, because it's all optional!

Your types are already optional if you're adding constructors for each permutation of all input parameters.

Which is no worse than the situation in a dynamically typed language where every field in every object could be optional.

Dynamic typing advocates sometimes miss that statically typed languages don't force you to encode every invariant in the type system, just those that seem important enough.

Or, if you really want to go overboard, you could use a dependently typed language and write functions that only accept cars with a specific combination of fields not being empty. But that's typically not worth the complexity.

Frankly, your contract was that you have no idea what fields those objects might have. I'm just fulfilling it. You won't have checks for validity of each field, as Optional is valid, but you will have to have code that handles Optional<> types (so things like foo.getModel().orElse()...), which is the requirement you described. That doesn't mean you'll be constantly checking the validity of each field.