Hacker News new | ask | show | jobs
by dangets 1204 days ago
I would think stacking 10 generic conditions wouldn't scale if you are trying to mix and match arbitrarily. If you are trying to mix NonEmpty and AllEven and AllGreaterThan100 for the List example, then you would get the combinatorial explosion of types.

In practice, I find it is usually something like 'UnvalidatedCustomerInfo' being parsed into a 'CustomerInfo' object, where you can validate all of the fields at the same time (phone number, age, non-null name etc.). Once you have parsed it into the internal 'CustomerInfo' type - you don't need to keep re-validating the expected invariants everywhere this type is used. There was a good video on this that I wish I could find again, where the presenter gave the example of using String for a telephoneNumber field instead of a dedicated type. Any code that used the telephoneNumber field would have to re-parse and validate it "to be safe" it was in the expected format.

The topic of having untrusted external types and internal trusted types is also explained in the book `Domain Modeling Made Functional` which I highly recommend.

4 comments

> If you are trying to mix NonEmpty and AllEven and AllGreaterThan100 for the List example, then you would get the combinatorial explosion of types.

This is overthinking it. Usually, when people are not used to doing constructive data modeling, they get caught up on this idea that they need to have a datatype that represents their data in some canonical representation. If you need a type that represents an even number, then clearly you must define a type that is an ordinary integer, but rules out all odd numbers, right?

Except you don’t have to do that! If you need a number to always be even (for some reason), that suggests you are storing the wrong thing. Instead, store half that number (e.g. store a radius instead of a diameter). Now all integers are legal values, and you don’t need a separate type. Similarly, if you want to store an even number greater than 100, then use a natural number type (i.e. a type that only allows non-negative integers; Haskell calls this type `Natural`) and store half that number minus 102. This means that, for example 0 represents 102, 1 represents 104, 2 represents 106, 3 represents 108, etc.

If you think this way, then there is no need to introduce a million new types for every little concept. You’re just distilling out the information you actually need. Of course, if this turns out to be a really important concept in your domain, then you can always add a wrapper type to make the distinction more formal:

    newtype EvenGreaterThan100 = EvenGreaterThan100 Natural

    evenGreaterThan100ToInteger :: EvenGreaterThan100 -> Integer
    evenGreaterThan100ToInteger (EvenGreaterThan100 n) = (toInteger n * 2) + 102

    integerToEvenGreaterThan100 :: Integer -> Maybe EvenGreaterThan100
    integerToEvenGreaterThan100 n
      | n < 100 = Nothing
      | otherwise = case n `quotRem` 2 of
          (q, 0) -> Just (EvenGreaterThan100 q)
          (_, _) -> Nothing
Of course, this type seems completely ridiculous like this, and it is. But that’s because no real program needs “an even number greater than one hundred”. That’s just a random bag of arbitrary constraints! A real type would correspond to a domain concept, which would have a more useful name and a more useful API, anyway.

I wrote a followup blog post here that goes into more detail about this style of data modeling, with a few more examples: https://lexi-lambda.github.io/blog/2020/08/13/types-as-axiom...

What if you want it to only be less that 100 though? Not everything is so easily expressable the way you're saying.
You can go as far (or as short) as the application warrants. The more static evidence you want, the more cumbersome it's going to become. That seems sort of a natural progression, but the the point is that you get to choose the trade-off. It's often easier to prove a special case than a more general case.

(Certainly, Haskell is probably not the most concise language for this kind of thing. LiquidHaskell adds interesting proof capabilities wrt. arithmetic.)

Regardless, even just parsing at the boundary and using an opaque type

    MySuperSpecialInt = Int @Even Min(2) Max(100)
(or whatever syntax you want) is still better than just using Int. At least you'll know that you'll always be handed a value that's in range (post-parsing).
Yes, certainly. Constructive data modeling is useful for many things, but it’s not a panacea. Other techniques are useful in cases where it’s not practical; I discuss some of the tradeoffs in this followup post: https://lexi-lambda.github.io/blog/2020/11/01/names-are-not-...
You would also run into a form of "type erasure" in most cases and would need a combinatorial explosion of methods unless the type system was able to 'decorate' the generic parameter with additional information.

e.g. imagine a `parseNonEmpty` and a `parseAllEven` method. Both take a list and return either a `NonEmpty` or `AllEven` type. If I call `parseNonEmpty` and get a `NonEmpty`, then pass to `parseAllEven`, I now have erased the `NonEmpty` as I'm left with an `AllEven`. I would need a `parseAllEvenFromNonEmpty` to take a `NonEmpty` and return a `AllEven & NonEmpty`.

In Zig at least, the types-as-values framework means it'd be pretty easy to support arbitrary mixing and matching. If nobody beats me to it, I'll make a library to remove the boilerplate for that idea this weekend.
> In practice, I find it is usually something like 'UnvalidatedCustomerInfo' being parsed into a 'CustomerInfo' object, where you can validate all of the fields at the same time

In my experience you usually can't validate them all at the same time. For example, address. You usually don't validate that until after the customer has selected items, and then you find out that some items won't deliver to their area, so whereas you previously had a Valid basket, now it's an Invalid state.

Valid basket does not imply valid order.

Sometimes just splitting stuff into more types that are appropriate to specific portion of pipeline is a win.

Whether an item can be delivered to a particular address seems like a seperate concern to whether the address is correct/real.