Hacker News new | ask | show | jobs
by tailrecursion 4257 days ago
I want to question the importance of soundness in type systems. Suppose you could catch 99% of type-based errors instead of 100%, and in addition, use a compiler switch to see all the case statements where a class of a type is missing. The intended benefit of this system is that it accepts all correct programs. Correct meaning, the program runs and returns the correct answer.

Would such a language be viable? Or is it absolutely necessary to catch all possible type errors. In the past, tools like lint have been considered useful, although lint is not at all the kind of type system that I envision, namely a system that is in practice catching all the errors that ML catches -- it just doesn't (and cannot) guarantee it catches them.

The restrictive nature of static type systems today is legendary, but I wonder sometimes whether people realize how restrictive they are. If you can't make your idea work with functors or typeclasses, chances are good there's no way to get it to compile and you have to write functions in longhand.

4 comments

This happens absolutely all the time.

For instance, Haskell does it in almost exactly these terms. If you have a type like

    data Drums = Ba | Dum | Tish
and you have a function that consumes drums

    play :: Drums -> String
    play Ba  = "Ba!"
    play Dum = "Dum!"
you've left out the Tish. Possibly because you don't have a cymbal. You know this, but the compiler doesn't. This is considered to be an incomplete pattern match and would raise some eyebrows.

But often it just means there's an invariant which you know and the type system does not. In Haskell -Wall will stop you dead here, but you can relax it to say "ehhhh, I know what I'm doing".

More generally, you can also write

    play :: Drums -> String
    play Ba   = "Ba!"
    play Dum  = "Dum!"
    play Tish = error "Impossible! I have no cymbal"
This 'error' is a misleadingly named function. It's more like 'assert' and indicates that this is a completely impossible program state (much like running into an unbalanced tree deep inside a library function which knows that tree balance is maintained). It's outside of the reach of the programmer to fix. It's not an exception, it's an assurance to the compiler that even though there's something missing in the type safety things will be OK.
There is always idea of Gradual Typing [1] which has been implemented with varying degrees of success. There are also a large set of static analysis theories/tools which can help here.

It is also worth noting that if you dont want (global) type inference, you can get far in a language with permissive casting, type annotations and local inference. The results aren't a panacea though.

> Suppose you could catch 99% of type-based errors instead of 100%, and in addition, use a compiler switch to see all the case statements where a class of a type is missing.

I am not quite sure what you mean by this. Care to elaborate? (In general, with inference systems, missing type information is hard(ish) to localize. So pointing out where exactly a type error occurred is non-trivial)

[1] https://en.wikipedia.org/wiki/Gradual_typing

I'm thinking of a system with valueset inference where valuesets are not necessarily disjoint. So the system may infer that a return value is the disjunction {INTEGER | REAL}.

In order to get precision in checking, a lot of computation needs to be done so the compiler relaxes the precision when facing large disjunctions (networks) of constraints.

Dynamic checks are inserted as necessary but a system that relaxes when things get hairy can't guarantee that it will find all errors at compile time.

The idea is similar to Soft Typing of Cartwright and others, but they were thinking of an interactive system, some kind of programmer's aid. If I recall they ran into problems giving reasonable error messages.

The problem with this idea, at least compared with current implementations of Haskell and OCaml, is that it requires runtime type information. Right now, Haskell and OCaml both perform erasure, and the only kind of runtime type information used is for the GC, but if you have two ADTs with identical structure (number of branches, payloads) but different names (i.e. they are different types), they will be encoded in the same way. So, in order to tell them apart, you would need some additional type information, which would require changes to compilers and would probably performance characteristics.
> Would such a language be viable? Or is it absolutely necessary to catch all possible type errors. In the past, tools like lint have been considered useful, although lint is not at all the kind of type system that I envision, namely a system that is in practice catching all the errors that ML catches -- it just doesn't (and cannot) guarantee it catches them.

It's entirely possible - I believe Dylan implements something on these lines. IMO it's harder to reason about than a consistent type system with "escape hatches" like unsafePerformIO, and about equivalent in usability.

> The restrictive nature of static type systems today is legendary, but I wonder sometimes whether people realize how restrictive they are. If you can't make your idea work with functors or typeclasses, chances are good there's no way to get it to compile and you have to write functions in longhand.

I have yet to find a piece of "good code" that I couldn't implement in a typesafe way. I've occasionally found code that worked but couldn't be made typesafe (e.g. the "big global Map<String, Object>" antipattern), but it's always been code that I would have wanted to rewrite for readability and sanity anyway.

>I want to question the importance of soundness in type systems

Why? The most common and popular opinion is already that it is not important.

>Suppose you could catch 99% of type-based errors instead of 100%, and in addition, use a compiler switch to see all the case statements where a class of a type is missing. The intended benefit of this system is that it accepts all correct programs.

That doesn't make sense. That's like saying suppose we had a system of math that gave the right answer 99% of the time. How would doing the wrong thing 1% of the time give you the benefit of accepting all correct programs?

>The restrictive nature of static type systems today is legendary

Mythological. Legends are supposed to have some historical root.

>If you can't make your idea work with functors or typeclasses, chances are good there's no way to get it to compile and you have to write functions in longhand.

What does that even mean? That's like saying "if you can't make your idea work with classes or inheritance, chances are good that there's no way to get it to compile and you have to write functions in longhand".

Suppose we insisted in math that all proofs be computer-checkable. There would be no erroneous proofs, but there would also be many correct proofs that could not be accepted or even formulated. Mathematics would suffer overall.

Compiler enforced static typing rejects many correct programs, which is a disadvantage.

> Compiler enforced static typing rejects many correct programs, which is a disadvantage.

Only if you care about these programs. It is entirely possible many of these 'correct programs' are not useful at all, or arguably useful, or are outside the domain criteria, and rejecting them is fine. This is often a leading motivator behind the design of things like domain specific languages, or designs like MISRA C - many things by design cannot be done, but often those aren't the things we care about, use, or wish to discuss anyway.

On the flip side, there are things type systems enable that I do not think can be actually accomplished in an untyped language - but really, this isn't very interesting aside from being a minor technical factoid, IMO. It's very far removed from the more interesting question of whether or not it's useful to consider these things at all. And it's completely legitimate to say "No, these things aren't worth considering".

> Suppose we insisted in math that all proofs be computer-checkable. There would be no erroneous proofs, but there would also be many correct proofs that could not be accepted or even formulated.

Is that true? Is there a difference between the things which can be proven true in the sense used in mathematics and the set of things which are computable? The limits of proof (e.g., Goedel's incompleteness theorems) and computability (e.g., the halting problem) are at least deeply related, and I'm not at all convinced that your premise here is true.

> Compiler enforced static typing rejects many correct programs, which is a disadvantage.

In practice, however, it is only a theoretical disadvantage. The kind of "correct" programs that get rejected are not very useful.

> Compiler enforced static typing rejects many correct programs,

Do you have any examples of programs that would be actually useful? I don't necessarily doubt that they exist, but it's not an interesting point if the argument is basically "they exist, but I don't know of anyone in particular".

I think millstone is exaggerating a bit as most programs people write normally can be rewritten without too much pain in a static language with a rich type system. That said, there are some examples of times where your program asks for clever type system features (that are not always enabled by default or add to the learning curve) or where different type system features don't play nice together (so its hard to have both features in a single static language, but in a dynamic language you can use one style of programming at a time and never notice the problem).

In the first category, one example is programs requiring higher-rank polymorphism. They are always allowed in untyped languages but in Haskell this feature is hidden behind a language pragma to keep type inference simpler by default.

On the second category, an example I like is parametric polymorphism vs subtyping. The type system gets really tricky if you try to use both at the same time so most static languages either limit the polymorphism or don't allow subtyping. ON the other hand, in a dynamic language you can use both styles if you want, although you have to do so at your own risk.

The problem is that your supposition is just made up. There's nothing to support it. Saying "suppose some thing that isn't true" does not support an argument about reality.