Hacker News new | ask | show | jobs
by heavenlyhash 2477 days ago
> (I might suggest the underlying problem in this code is relying on inference too much, but the threshold for "too much" is difficult to communicate to users.)

This is a very outstandingly interesting line out the whole writeup.

I like the writeup in its entirety for being very balanced and thoughtful, but this line in particular really stands out to me as worth more thought for anyone interested in language and type system design.

Inference is great.

Except... when it's not. When it's "too much". When it starts making breaking changes appear "too distantly".

It's an interesting topic to reflect upon, because the inference isn't making a breakage; just shifting around where the breakage appears. (And this makes it hard to ever direct criticism at an inference mechanism!) But this kind of shifting-around of the breakage appearance can be a drastic impact on the ergonomics of handling it: how early it's detected, how close to the important change site tooling will be able to point the coder, etc. That's important. And almost everything about it involves the word "too" -- which means the area is incredibly subjective, and requires some kind of norm-building... while not necessarily providing any clear place for that norm-building to center around.

I don't have a point here other than to say this is interesting to reflect on. I suspect the last chapter on type inference systems has not yet been written. Can an inference system be designed such that it naturally restrains "too" much use of it?

12 comments

Thanks for highlighting this bit, I also find it very interesting to think about!

Here's a related property I've also discovered of 'advanced' type systems.

In my understanding of how unification happens in systems like H-M, you first give every expression its own type variable, then perform a search to produce assignments to those that remain valid and leave the remainders generic.

But as your type system gets fancier, and particularly in the presence of union types and subtyping, this search process can feel like "make whatever type judgements are necessary to make this program still compile". E.g. in code like

    let x = new Set();
    x.add('hello');
    x.add(3);  // oops, meant to add the string '3'
    f(x);
The inferencer can just reason "oh apparently x is Set<number|string>", and "oh apparently f() operates over all kinds of sets, Set<{}>". And especially when there's never a more specific type for things to "bottom out" on (like say f just forwards the set elements to JSON.stringify(), which accepts both string and number already), nothing will ever reveal to you that you actually wrote a bug.

But then meanwhile even in Haskell you run into cases where the inferencer wasn't generic enough, like https://wiki.haskell.org/Monomorphism_restriction , so it's not even clear cut that you want the inferencer to be smarter or dumber. As my coworker says: as a programmer you have to be able to basically run the inferencer in your head, and that becomes very hard when the inferencer gets very smart. (See the RxJS bug in the above blog post.)

That problem has a simple solution: you, the programmer, should write type signatures in appropriate places as a combination of what documentation and assert statements do in other languages. No compiler today could be reasonably expected to guess which inferences the human brain will find surprising, which is why it's up to the programmer.
This is the point under discussion though -- humans think they're reasoning the same way as the inferencer but can't get it right in the limit, so they're not really equipped to know where they ought to have added type annotations. See e.g. the comment here about the inferencer working 'backwards': https://github.com/ReactiveX/rxjs/issues/4959#issuecomment-5... .
You add the types on every function definition, and almost never have a problem.

Many people forget that the point of type inference is to eliminate uselessly redundant clutter, not to completely eliminate types from source code. A function is usually called many times, so keeping types on public API definitions and eliminating types from call sites and throwaway REPL code eliminates a huge amount of wasted typing (hah!) with near no loss of clarity.

In the case of an RxJS pipeline, as one example, you often have a huge variety of lambda functions that may indeed be called many times, but often exist themselves as "one off" pieces in a pipeline. Leaving the types to be inferred and/or generic, is sometimes even necessary for the most code-reuse of functions between pipelines. It's also rare for such functions to feel like they are a part of a public API definition, because they feel so huge a part of internal and boring plumbing.

As someone that has done a lot of RxJS (and kindred libraries) programming over the last few years, I have a lot of sympathy for anyone hurt by type inference changes in the middle of those pipelines. Some of my ugliest rats nests of explicit any types and/or worse have been buried in RxJS pipelines next to // TODO and // HACK marker warnings to future spelunkers (myself included) about how much I was fighting the inferencing engine and losing in that particular combo of Typescript version and available typings. It's usually great when I can fix those later, and so far Typescript continues to move in that direction where those rats nests only get cleaner when I get a chance to revisit them. (But again, I sympathize with that existential fear that Typescript inferencing magic can sometimes take an unexpected step back until you understand why the change was made and how it benefits you elsewhere and/or once you figure out the root cause.)

> appropriate places This
This seems like a good task for a linter and some commit hooks.

During development, I like to keep my types unspecified until I'm confident that things have congealed properly. But before merging, I like to make sure and add type annotations to anything that's meant for public consumption.

Partially for readability and documentation. Partially because adding those annotations ensures that the type checker and I are on the same page. But mostly because those manual type annotations represent a promise that I, the programmer, have made about the current and future behavior of a certain chunk of code. Once those promises have been codified, then the type checker has my back and can help warn me if I'm about to break one of them with a careless change at some point in the future.

IIRC there are a couple of Rust write ups about this.

In Rust, inference is limited within functions. The language doesn't allow inferring the argument or return types of functions to avoid this kind of action at a distance.

The function API is just one of the many arbitrary places where one can limit inference and require users to provide types. In other languages the module boundary might also be an appropriate place to do that.

A big part of TS is the ease of transition from JS where you immediately get benefits through gradual typing and return type inference is probably a big part of that.
This is true, but a lot of projects are now starting with TS, and maybe could benefit from intentional limitations on inference, at least locally. I’m pretty sure most folks start new projects with strict enabled as is.
In my experience a lot of people don’t necessarily start projects with strict mode — and they _really really_ should ... most people I’ve talked to who couldn’t figure out how to use typescript never figured things out because they ended up somehow with a compiler configured with noImplicitAny: false — typescript really needs to find a way to change the default here ...
Yes, my advice is almost always strict should be true for every new project, and even sometimes for migration projects where the end goal is to absolutely decrease tech debt and bugs as quickly as possible and the team isn't afraid of a giant compiler error waterfall as motivation. (It's a nice auto-generated TODO list with some semblance of progress reporting!) But at the very least, everyone should do themselves the favor and "noImplicitAny: true" no matter if it's new, migration, something inbetween, and no matter what they think of any of the other strict flags. Explicit anys are searchable TODO markers and fine, but implicit anys always seem to be hidden bugs waiting to be found.
Sure. But if you write some code (or update a dependency) and it breaks with spooky action at a distance, the first thing you should do is to pay off your tech debt and add type signatures to find the breakage, before thinking about changing TS.

The github comment says: " (I might suggest the underlying problem in this code is relying on inference too much, but the threshold for "too much" is difficult to communicate to users.)"

The first part is exactly right; the second part is well established in the Rust language spec and Haskell style guides like http://www.cis.upenn.edu/~cis552/current/styleguide.html and https://kowainik.github.io/posts/2019-02-06-style-guide#func...

The compiler should be capable of inferring the type as far as reasonably possible. But it should be an error to not specific the types of top level functions and class methods.

This way, when you write

    const f = (a, b) => a + b
the compiler can tell you to write:

    const f => <T extends number|string>(a: T, b: T): T => a + b
or

    const f : <T extends number|string>(a: T, b: T) => T
            = (a, b) => a + b
This would be effectively like GHC's typeholes, where the compiler will tell you what to write when you say

    f :: _
    f a b = a <> b
except that rather than being an optional step by a developer, it's a mandatory standard.

(Well, I don't think typescript infers that type for that function, but whatever it actually infers, that's what should be there in the error message.)

I don't think typeholes is necessary for that feature. It suffices to just use type check up to the function signature. If I write:

    fn foo() { 42 } 
in Rust, I get an error that says that `i32` (the type of `42`) is not of type `()` (unit), which is the return type of `foo`. So I can just change it to:

    fn foo() -> i32 { 42 } 
instead. The same applies for generics, if I write:

    fn bar<T>(x: T, y: T) -> T { x + y }
I get the precise error that T does not implement the `Add<T>` trait. In this case the error is precise because there is only a trait in scope that supports `+`, but once there are many traits that would fit, which is rare, the compiler suggests some (often all of them). With that error I can just change that to:

    fn bar<T: Add>(x: T, y: T) -> T { x + y }
This is all done through local type inference within the function.
Same choice Java made.
This sounds like a good tradeoff. Perhaps this could be added to future versions of TS with a flag, if it's not already in there.
I think the underlying problem is that in cases of long inference chains, it's hard to detect where the problem lies if types on both ends don't match. In my experience, the compiler will just show an error on the "latter" end, then the user needs to trace inference back and figure out where things went wrong.

Personally, I experience this quite often, e.g. in Java whenever arrow functions are heavily used, such as with streams.

It would be nice to have compilers report the whole inference chain somehow, just to be clear about all possible spots where things might be wrong. But I suppose that's difficult to visualize in a well human-readable way.

Some IDEs show type hints on hover, or if type mismatch is detected, they display inferred types inline in code. E.g Intellij Idea does that for Scala and it is really useful. I wish it did that in Java as well.
> But I suppose that's difficult to visualize in a well human-readable way.

It's not. The type-checker must have constructed a call stack of inferred types in order to find the violation, so it simply has to print all the calls (source line and line number) and the types it deduced, and let the programmer (or IDE) compare that to the code in context and look for surprises.

Typically my strategy is:

1) Make types explicit (or generic) at any interface boundary

2) Periodically query the inferred types of things from my editor just to make sure they come out to what I expect

3) If an error message feels like it's in the wrong place or an inferred type isn't what it should be, progressively make things more explicit until the problem is resolved

But you're right, that's all very subjective and dependent on learned norms.

This is the same strategy I use for writing tests. This is natural, because types are compile-time tests.
> because the inference isn't making a breakage; just shifting around where the breakage appears

That's a very valuable insight, and a concern in a lot of languages with advanced inference!

The problem is exacerbated in Typescript though because it is fundamentally a practical, evolved layer over Javascript, with an unsound type system that likes to bail out to any.

>this code is relying on inference too much

Is this some TypeScript joke I'm too Hindley-Milner to understand?

Not sure. Once upon a time I wrote a small program in Haskell for an exercise in cryptography. It defaulted to Word8 for all arithmetic throughout the program because some of the operands came from a ByteString. Scary. Maybe you are less too Hindley-Milner and more following the advice of explicit types on at least all top-level declarations?
Could be, yes :)
The real problem with shifting where it appears is when the error message is not helpful. With good error messages, it's not enough of a problem to justify leaving type inference in my opinion.
Just to clarify: you're suggesting I believe that the programmer should be advised by the compiler to add one or two explicit type declarations to certain programs, not that the compiler should refuse to compile any programs that currently compile.
It should be a compiler strictness flag.
I also found it interesting because just as the later sections complain about cases where they relied on inferencing "too much", the earliest section also shows cases where they relied on inferencing "not enough". (Had the developer not explicitly typed by copying the inferred type from a tooltip for that D3.js return object, for instance, and left it to be inferred, it wouldn't have been broken in the generics change.)

It's an interesting goldilocks paradox likely only exacerbated by the size of Google's codebase and variety of developer code styles/inferencing preferences.

In my experience with type inference (quite a bit of mypy, now learning rust), inferring local variable types based on what's assigned to it is not very confusing, as long as me or the IDE can figure out what type is being assigned. But when a function return value or variable's generic argument is determined by where the value is used, I tend to have more trouble (since one type can propagate from one function, to its argument, to the function which generated the argument).
They pain of "unexpected any" they have doesn't come just from the type inference, but from gradual typing that makes it "too much". Without unknown/any being allowed to spread everywhere, the inference would have failed sooner, and the error/confusion would be contained (e.g. Rust stops on an ambiguous method call, plus limits implicit inference to function boundaries).
I feel this is a lesson every Haskell programmer goes through. Most Haskell programmers I know write types for all of the top-level declarations. Although it usually isn't necessary, it makes drastic improvements to the quality of error messages.
> Although it usually isn't necessary, it makes drastic improvements to the quality of error messages.

It also provides valuable documentation. You can usually work out what a function does from just its type (and quite often, that's all you have to work with).

> just its type (and quite often, that's all you have to work with).

which is a weakness in the culture of Haskell library authors and/or the ecosystem for contributing documentation patches.

Couldn't agree more. My recent experiences with Rust and especially Elixir have set high standards.
You should have let other people highlight this bit.