Hacker News new | ask | show | jobs
by g15jv2dp 688 days ago
> I think the even bigger elephant in the room is that TypeScript's type system is unsound.

Can you name a single language that is used for high-performance software and whose type system is sound? To speed up the process, note that none of the obvious candidates have sound type systems.

5 comments

Java, C#, Scala, Haskell, and Dart are all sound as far as I know.

Soundness in all of those languages involves a mixture of compile-time and runtime checks. Most of the safety comes from the static checking, but there are a few places where the compiler defers checking to runtime and inserts checks to ensure that it's not possible to have an expression of type T successfully evaluate to a value that isn't an T.

TypeScript doesn't insert any runtime checks in the places where there are holes in the static checker, so it isn't sound. If it wasn't running on top of a JavaScript VM which is dynamically typed and inserts checks everywhere, it would be entirely possible to segfault, violate memory safety, etc.

I can't speak for the others, but Java allows assigning arrays of subtypes to variables declared as an array of a supertype, which isn't sound:

    class A {}
    class B1 extends A {}
    class B2 extends A {}
    
    A[] arr = new B1[1];
    arr[0] = new B2();
In the above example only way that assigning an array of `B1` to a variable typed as an array of `A` is if only valid `B1` objects are ever put into it, at which point there's no reason not to just have the variable typed as a `B1` array. It still will compile fine though!
Array covariance is sound because you'll get a runtime error if you try to write to an array of the wrong type.
How is "you'll get a runtime error if you try" any different from the unsoundness described above in TypeScript?
Because the context here is the idea of using the type system to justify removing those sorts of dynamic checks to generate better code.

The dynamic checks in the Java case are are a well-defined and narrowly-targeted part of the language semantics- you get an exception on mismatched array writes, out-of-bounds access, etc., but when an expression produces a value it always matches its type.

TypeScript defers these kinds of type system violations to the underlying JavaScript engine, which makes things work out (sometimes with an exception, but sometimes just proceeding with a value that doesn't match the expression's type) using precisely the dynamism we wanted to get rid of. And this can leak out and cause arbitrarily-far-away parts of the program not to match their types, either.

> Because the context here is the idea of using the type system to justify removing those sorts of dynamic checks to generate better code

It's more specific than that; the discussion is about writing an ahead-of-time compiler, which necessarily wouldn't be running on a JavaScript engine. The compiler could just as easily emit code that always throws a runtime exception instead of emitting an equivalent to whatever the JavaScript would do.

All of these have, at the very least, escape hatches that makes the type system unsound overall. And probably other issues https://counterexamples.org/ I can find a few in there for at least scala and haskell. Perhaps this is not a satisfying answer to you, an "unsound type system" is a technical, precise notion, and this is what people who parrot "typescript is unsound" are referring to. You cannot just reply "well there are a few runtime checks so it's all good."
> an "unsound type system" is a technical, precise notion

Yup. Milner's "can't go wrong", progress and preservation, etc.

> You cannot just reply "well there are a few runtime checks so it's all good."

Sure I can. I really like how Shriram Krishnamurthi describes soundness in Programs and Programming Languages [1]. I can't think of a better definition for soundness than:

"The central result we wish to have for a given type-system is called soundness. It says this. Suppose we are given an expression (or program) e. We type-check it and conclude that its type is t. When we run e, let us say we obtain the value v. Then v will also have type t."

The "we obtain the value v" part is critical. If an expression of type e doesn't produce a value at all (it terminates or throws an exception), then we have also satisfied soundness.

Indeed, note that he also says:

"Any rich enough language has properties that cannot be decided statically (and others that perhaps could be, but the language designer chose to put off until run-time to reduce the burden on the programmer to make programs pass the type-checker). When one of these properties fails—e.g., the array index being within bounds—there is no meaningful type for the program. Thus, implicit in every type soundness theorem is some set of published, permitted exceptions or error conditions that may occur. The developer who uses a type system implicitly signs on to accepting this set."

A term like "soundness" for a programming language should be useful. We could, for example, define "evenality" as a property of programming languages where we say that a language whose built-in atomic types have names that are all an even number of letters has evenality and other languages don't. That's a well-defined concept and we could neatly partition extant languages into whether they have evenality or not. But who cares?

When it comes to soundness, the above definition from PAPL is useful for (at least) two concrete reasons:

1. When a user is reading code, if they see an expression has some type T, they can safely reason that any value the expression evaluates to will have type T and when they are reasoning about code surrounding that expression, they can rely on that fact.

2. Likewise, when a compiler is compiling code, it can safely assume that if an expression has type T, then all subsequent code that depends on the value of that expression can assume it has type T. The compiler can optimize safely and correctly based on that assumption.

Neither of these properties require that all type checks are performed at compile time. If the runtime throws an exception on out of bounds array indices, that still correctly preserves the soundness invariant that the type of an array element access is the type of the array element. The reader might have to think about the fact that the expression could throw. But they don't have to think about it evaluating to the wrong type.

If that's not your definition of soundness and you require a sound language to have zero runtime checks, then I'm not aware of any widely-used language that meets that requirement, nor do I see how it's a particularly useful term.

Note that it's not the case that every language is sound according to the above definition. C, C++, TypeScript, and Dart 1.0 (but not 2.0 and later) are all unsound. In the first two, it's possible to completely reinterpret memory as another type which leads to the majority of software security issues in the world. In the latter two, the only reason that doesn't happen is because the underlying execution environment doesn't rely on the static types of expressions at all.

[1] https://papl.cs.brown.edu/2020/safety-soundness.html#%28part...

(Not GP) That was an interesting and educational explanation, thank you!
JVM bytecode is a "language" and is proven to be sound. The languages that compile to that language, on the other hand, are a different kettle of fish.
This is specifically about type systems. It's easy to have a sound type system when you have no type system.

Also, I'm not too familiar with JVM bytecode, but if I load i64 in two registers and then perform floating point addition on these registers, does the type system prevent me from compiling/executing the program?

Can you say more about "proven to be sound"? Are you talking about a sound type system?

It does have a type system.

https://docs.oracle.com/javase/specs/jvms/se22/html/jvms-2.h...

JVM is a stack not register machine and yes the type system will prevent that from running. It will fail verification.

The type checker is specified in Prolog and rejects the above scenario:

    instructionIsTypeSafe(fadd, Environment, _Offset, NextStackFrame, ExceptionStackFrame) :- 
        validTypeTransition(Environment, [float, float], float, StackFrame, NextStackFrame),
        exceptionStackFrame(StackFrame, ExceptionStackFrame).
Fun fact: Said type system has a 'top' type that is both the top type of the type system as well as the top half of a long or double, as those two actually take two values while everything else, including references, is only one value. Made some sense when everything was 32 bit, less so today.
Maybe OCaml, but I haven't studied it much.
I doubt it's been proved to be sound. It shows up a lot on https://counterexamples.org/, although if I skim the issues seem to have been fixed since then.
I've run a few times into messages of the sort "you can't use these features together" before and I assume at least sometimes these were lessons that they had to learn the hard way.
Scala 3 has aimed to get sound but I’m not sure how far they got?

https://www.scala-lang.org/api/3.x/docs/blog/2016/02/17/scal...

I'm a little behind times on Haskell (haven't used it for some years) – there always were extensions that made it unsound, but the core language was pretty solid.
Well, it does use unsafePerformIO. It's not particularly horrible most of the time, but in this case it obviously is.