Hacker News new | ask | show | jobs
by wk_end 688 days ago
At least without additional extensions, TypeScript would help less than you think. It just wasn’t designed for the job.

As a simple example - TypeScript doesn’t distinguish between integers and floats; they’re all just numbers. So all array accesses need casting. A TypeScript designed to aid static compilation likely would have that distinction.

But the big elephant in the room is TypeScript’s structural subtyping. The nature of this makes it effectively impossible for the compiler to statically determine the physical structure of any non-primitive argument passed into a function. This gives you worse-than-JIT performance on all field access, since JITs can perform dynamic shape analysis.

4 comments

> A TypeScript designed to aid static compilation likely would have that distinction.

AssemblyScript (https://www.assemblyscript.org/) is a TypeScript dialect with that distinction

It’s advertised as that, and it’s a cool project, but while it’s definitely a statically typed language that reuses TypeScript syntax, it’s not clear to me just what subset of the actual TypeScript type system is supported. That’s necessarily bad—TypeScript itself is very unclear about what its type system actually is. I just think the tagline is misleading.
Probably a better way to think about AssemblyScript is first as a DSL for WASM, and second as providing a subset of TS syntax and authoring semantics to achieve that. The type system is closer to TS than the syntax and semantics. At least that was my experience when I explored it some time back.
The tagline on the site seems to be " TypeScript-like language for WebAssembly", which seems pretty clear to me that it's not pretending to be a strict subset or anything.
*not necessarily bad, d’oh
Such Typescript already exists, Static Typescript,

https://makecode.com/language

Microsoft's AOT compiler for MakeCode, via C++.

The 2019 paper[1] says: “STS primitive types are treated according to JavaScript semantics. In particulars, all numbers are logically IEEE 64-bit floating point, but 31-bit signed tagged integers are used where possible for performance. Implementation of operators, like addition or comparison, branch on the dynamic types of values to follow JavaScript semantics[.]”

[1]: https://www.microsoft.com/en-us/research/publication/static-...

I think the even bigger elephant in the room is that TypeScript's type system is unsound. You can have a function whose parameter type is annotated to be String and there's absolutely no guarantee that every call to that function will pass it a string.

This isn't because of `any` either. The type system itself deliberately has holes in it. So any language that uses TypeScript type annotations to generate faster/smaller code is opening itself to miscompiling code and segfaults, etc.

It might be useful for an interpreter though. I believe that in V8 you have this probabilistic mechanism in which if the interpreter "learns" that an array contains e.g. numbers consistently, it will optimize for numbers and start accessing the array in a more performance way. Typescript could be used to inform the interpreter even before execution. (My supposition, I'm not an interpreter expert)
So - I know this in theory, but avoided mentioning it because I couldn’t immediately think of any persuasive examples (whereas subtype polymorphism is a core, widely used, wholly unrestricted property of the language) that didn’t involve casts or any/unknown or other things that people might make excuses for.

Do you have any examples off the top of your head?

Here's an example I constructed after reading the TS docs [1] about flow-based type inference and thinking "that can't be right...".

It yields no warnings or errors at compile stage but gives runtime error based on a wrong flow-based type inference. The crux of it is that something can be a Bird (with "fly" function) but can also have any other members, like "swim" because of structural typing (flying is the minimum expected of a Bird). The presence of a spurious "swim" member in the bird causes tsc to infer in a conditional that checks for a "swim" member that the animal must be a Fish or Human, when it is not (it's just a Bird with an unrelated, non-function "swim" member).

    type Fish = { swim: () => void };
    type Bird = { fly: () => void };
    type Human = { swim?: () => void; fly?: () => void };

    function move(animal: Fish | Bird | Human) {
      if ("swim" in animal) {
        // TSC infers wrongly here the presence of "swim" implies animal must be a Fish or Human
        onlyForFishAndHumans(animal); 
      } else {
        animal;
      }
    }

    function onlyForFishAndHumans(animal: Fish | Human) {
      if (animal.swim) {
        animal.swim(); // Error: attempt to call "not-callable".
      }
      // (receives bird which is not a Fish or Human)
    }

    const someObj = { fly: () => {}, swim: "not-callable" };
    const bird: Bird = someObj;

    move(bird);

    // runtime error: [ERR]: animal.swim is not a function
[1] https://www.typescriptlang.org/docs/handbook/2/narrowing.htm...
Here's a simpler repro:

    type Bird = { id: number };
    type Human = { swim?: () => void; id: number };

    function move(animal: Bird | Human) {
        onlyForHumans(animal); 
    }

    function onlyForHumans(swimmer: Human) {
      if (swimmer.swim) {
        swimmer.swim();
      }
    }

    const someObj = { id: 1, swim: "not-callable" };
    move(someObj);
`someObj` gets casted as `Bird`, then `animal` gets casted as `Human`. Unfortunately unions here are indeed unsound.

As as workaround you could add a property `type` (either `"bird"` or `"human"`) to both `Bird` and `Human`, then TypeScript would be able to catch it.

This narrowing is probably not the best. I'm not sure why the TS docs suggest this approach. You should really check the type of the key to be safer, though it's still not perfect.

   if (typeof animal.swim === 'function') {....}
Compilers don't really have the option of just avoiding non-idiomatic code, though. If the goal is to compile TypeScript ahead of time, the only options are to allow it or to break compatibility, and breaking compatibility makes using ahead-of-time TypeScript instead of some native language that already exists much less compelling.
https://counterexamples.org/ is a good collection of unsoundness examples in various languages.

For TypeScript, they list an example with `instanceof`:

https://counterexamples.org/polymorphic-union-refinement.htm...

In the playground:

https://www.typescriptlang.org/play/?#code/GYVwdgxgLglg9mABA...

  const foo: string[] = ['foo']
  
  // inferred as string
  // though it is undefined
  const bar = foo[1]
> I think the even bigger elephant in the room is that TypeScript's type system is unsound.

Can you name a single language that is used for high-performance software and whose type system is sound? To speed up the process, note that none of the obvious candidates have sound type systems.

Java, C#, Scala, Haskell, and Dart are all sound as far as I know.

Soundness in all of those languages involves a mixture of compile-time and runtime checks. Most of the safety comes from the static checking, but there are a few places where the compiler defers checking to runtime and inserts checks to ensure that it's not possible to have an expression of type T successfully evaluate to a value that isn't an T.

TypeScript doesn't insert any runtime checks in the places where there are holes in the static checker, so it isn't sound. If it wasn't running on top of a JavaScript VM which is dynamically typed and inserts checks everywhere, it would be entirely possible to segfault, violate memory safety, etc.

I can't speak for the others, but Java allows assigning arrays of subtypes to variables declared as an array of a supertype, which isn't sound:

    class A {}
    class B1 extends A {}
    class B2 extends A {}
    
    A[] arr = new B1[1];
    arr[0] = new B2();
In the above example only way that assigning an array of `B1` to a variable typed as an array of `A` is if only valid `B1` objects are ever put into it, at which point there's no reason not to just have the variable typed as a `B1` array. It still will compile fine though!
Array covariance is sound because you'll get a runtime error if you try to write to an array of the wrong type.
How is "you'll get a runtime error if you try" any different from the unsoundness described above in TypeScript?
All of these have, at the very least, escape hatches that makes the type system unsound overall. And probably other issues https://counterexamples.org/ I can find a few in there for at least scala and haskell. Perhaps this is not a satisfying answer to you, an "unsound type system" is a technical, precise notion, and this is what people who parrot "typescript is unsound" are referring to. You cannot just reply "well there are a few runtime checks so it's all good."
> an "unsound type system" is a technical, precise notion

Yup. Milner's "can't go wrong", progress and preservation, etc.

> You cannot just reply "well there are a few runtime checks so it's all good."

Sure I can. I really like how Shriram Krishnamurthi describes soundness in Programs and Programming Languages [1]. I can't think of a better definition for soundness than:

"The central result we wish to have for a given type-system is called soundness. It says this. Suppose we are given an expression (or program) e. We type-check it and conclude that its type is t. When we run e, let us say we obtain the value v. Then v will also have type t."

The "we obtain the value v" part is critical. If an expression of type e doesn't produce a value at all (it terminates or throws an exception), then we have also satisfied soundness.

Indeed, note that he also says:

"Any rich enough language has properties that cannot be decided statically (and others that perhaps could be, but the language designer chose to put off until run-time to reduce the burden on the programmer to make programs pass the type-checker). When one of these properties fails—e.g., the array index being within bounds—there is no meaningful type for the program. Thus, implicit in every type soundness theorem is some set of published, permitted exceptions or error conditions that may occur. The developer who uses a type system implicitly signs on to accepting this set."

A term like "soundness" for a programming language should be useful. We could, for example, define "evenality" as a property of programming languages where we say that a language whose built-in atomic types have names that are all an even number of letters has evenality and other languages don't. That's a well-defined concept and we could neatly partition extant languages into whether they have evenality or not. But who cares?

When it comes to soundness, the above definition from PAPL is useful for (at least) two concrete reasons:

1. When a user is reading code, if they see an expression has some type T, they can safely reason that any value the expression evaluates to will have type T and when they are reasoning about code surrounding that expression, they can rely on that fact.

2. Likewise, when a compiler is compiling code, it can safely assume that if an expression has type T, then all subsequent code that depends on the value of that expression can assume it has type T. The compiler can optimize safely and correctly based on that assumption.

Neither of these properties require that all type checks are performed at compile time. If the runtime throws an exception on out of bounds array indices, that still correctly preserves the soundness invariant that the type of an array element access is the type of the array element. The reader might have to think about the fact that the expression could throw. But they don't have to think about it evaluating to the wrong type.

If that's not your definition of soundness and you require a sound language to have zero runtime checks, then I'm not aware of any widely-used language that meets that requirement, nor do I see how it's a particularly useful term.

Note that it's not the case that every language is sound according to the above definition. C, C++, TypeScript, and Dart 1.0 (but not 2.0 and later) are all unsound. In the first two, it's possible to completely reinterpret memory as another type which leads to the majority of software security issues in the world. In the latter two, the only reason that doesn't happen is because the underlying execution environment doesn't rely on the static types of expressions at all.

[1] https://papl.cs.brown.edu/2020/safety-soundness.html#%28part...

(Not GP) That was an interesting and educational explanation, thank you!
JVM bytecode is a "language" and is proven to be sound. The languages that compile to that language, on the other hand, are a different kettle of fish.
This is specifically about type systems. It's easy to have a sound type system when you have no type system.

Also, I'm not too familiar with JVM bytecode, but if I load i64 in two registers and then perform floating point addition on these registers, does the type system prevent me from compiling/executing the program?

Can you say more about "proven to be sound"? Are you talking about a sound type system?

It does have a type system.

https://docs.oracle.com/javase/specs/jvms/se22/html/jvms-2.h...

JVM is a stack not register machine and yes the type system will prevent that from running. It will fail verification.

The type checker is specified in Prolog and rejects the above scenario:

    instructionIsTypeSafe(fadd, Environment, _Offset, NextStackFrame, ExceptionStackFrame) :- 
        validTypeTransition(Environment, [float, float], float, StackFrame, NextStackFrame),
        exceptionStackFrame(StackFrame, ExceptionStackFrame).
Fun fact: Said type system has a 'top' type that is both the top type of the type system as well as the top half of a long or double, as those two actually take two values while everything else, including references, is only one value. Made some sense when everything was 32 bit, less so today.
Maybe OCaml, but I haven't studied it much.
I doubt it's been proved to be sound. It shows up a lot on https://counterexamples.org/, although if I skim the issues seem to have been fixed since then.
I've run a few times into messages of the sort "you can't use these features together" before and I assume at least sometimes these were lessons that they had to learn the hard way.
Scala 3 has aimed to get sound but I’m not sure how far they got?

https://www.scala-lang.org/api/3.x/docs/blog/2016/02/17/scal...

I'm a little behind times on Haskell (haven't used it for some years) – there always were extensions that made it unsound, but the core language was pretty solid.
Well, it does use unsafePerformIO. It's not particularly horrible most of the time, but in this case it obviously is.
Outside of really funky code, especially code originally written in TS, you can assume the interface is the actual underlying object. You could easily flag non-recognized-member accesses to interfaces and then degrade them back to object accesses.
You’re misunderstanding me, I think.

Suppose you have some interface with fields a and c. If your function takes in an object with that interface and operates on the c field, what you want is to be able to do is compile that function to access c at “the address pointed to by the pointer to the object, plus 8” (assuming 64-bit fields). Your CPU supports such addressing directly.

Because of structural subtyping, you can’t do that. It’s not unrecognized member. But your caller might pass in an object with fields a, b, and c. This is entirely idiomatic. Now c is at offset 16, not 8. Because the physical layout of the object is different, you no longer have a statically known offset to the known field.

I would bet that, especially outside of library code, 95+% of the typed objects are only interacted with using a single interface. These could be turned into structs with direct calls.

Outside of this, you can unify the types. You would take every interface used to access the object and create a new type that has all of the members of both. You can then either create vtables or monomorphize where it is used in calls.

At any point that analysis cannot determine the actual underlying shape, you drop to the default any.

Which is exactly the kind of optimizations JIT compilers are able to perform, and AOT compiler can't do them safely without having PGO data, and even then, they can't re-optimize if the PGO happens to miss a critical path that breaks all the assumptions.
> Because of structural subtyping, you can’t do that

In practice v8 does exactly what you're saying can't be done, virtually all the time for any hot function. What you mean to say is that typescript type declarations alone don't give you enough information to safely do it during a static compile step. But modern JS engines, that track object maps and dynamically recompile, do what you described.

I mentioned this in my original comment:

> This gives you worse-than-JIT performance on all field access, since JITs can perform dynamic shape analysis.

We're talking about using types to guide static compilation. Dynamic recompilation is moot.

Oh, I thought JIT in your comment meant a single compilation. Either way, having TS type guarantees would obviously make optimizing compilers like v8's stronger, right? You seem to be arguing there's no value to it, and I don't follow that.
My claim is that the guarantees that TS provides aren't strong enough to help a compiler produce stronger optimizations. Types don't just magically make code faster - there's specific reasons why they can make code faster, and TypeScript's type system wasn't designed around those reasons.

A compiler might be able to wring some things out of it (I'm skeptical about obviouslynotme's suggestions in a cousin comment, but they seem insistent) or suppress some checks if you're happy with a segfault when someone did a cast...but it's just not a type system like, say, C's, which is more rigid and thus gives the compiler more to work with.