Hacker News new | ask | show | jobs
by willtim 3266 days ago
As a far as I can tell, an "optional type system" is just an unsound type system with a marketing name. Any decent static type system would allow one to progressively increase the level of typing, the weakest form being using a Variant type to hold all other types. The advantage here is that any invariants/proofs that are captured in the system are not compromised by unsoundness.
5 comments

    > As a far as I can tell, an "optional type system" is
    > just an unsound type system with a marketing name.
It is a term with a specific meaning. The key bit separating it from both gradual typing and a static type system with a dynamic type (like Scala or C#) is that in an optionally typed language, the type system has zero runtime effect. If you were to take a source file and mechanically strip out all of the type annotations, the resulting program behaves identically to the original one.

Dart 1.0 was an optionally typed language. Dart 2.0 is not -- we've gone to a mandatory static type system (with lots of inference and a dynamic type). The limitations of optional typing are simply too onerous, and it's just not what users expected or want.

    > Any decent static type system would allow one to 
    > progressively increase the level of typing, the weakest
    > form being using a Variant type to hold all other types.
That's the dream of optional typing and gradual typing. In practice, I have yet to see a language that does this harmoniously. It sounds simple, but things quickly get very complex when you start working with generic types and first-class functions because the boundary between the untyped and typed code gets very strange. To keep the guarantees you expect inside your typed code, you have to figure out how to handle untyped data that flows into it. That gets nasty very quickly when that untyped data may be an instance of a generic class or a function.
> To keep the guarantees you expect inside your typed code, you have to figure out how to handle untyped data that flows into it.

This would be solved in a conventional static type system by forcing the user to eliminate the variant, and forcing the user to deal with the case that it has an unexpected value.

So I cannot see why even C# dynamic is necessary, except to perhaps save some thinking, but this isn't a good thing IMHO.

The dynamic type was added to C# to facilitate interoperability (ex. DLR), not so much to encourage lazy devs to get their bugs at runtime rather than compile time.
Additionally to simplify dynamic dispatch calls on COM.
I still do not understand why this cannot be done via a Variant type. Also it doesn't explain why Dart 2 has a similar feature.
> I still do not understand why this cannot be done via a Variant type.

Makes little difference, either way you need a magical type since it has to allow any method call at compile time, and then requires enough RTTI that it can check for and dispatch method calls at runtime. Whether you call it Variant or dynamic has no relevance.

Think of dynamic as a different word for Variant.

The semantic differences for OLE2 Variants vs C# dynamic mostly only exist in implementation protocols.

The problem munificent was talking about is that for more complex types you cannot write that type checking predicate you want, which would check if a value belongs to the given type and return a variant-free version of the object. For example, if you have a reference to an array of integers in the typed part of the code you need to protect against untyped parts of the code inserting non-integers inside the array. The only solution is to insert typechecks in the reads and/or writes to the array (there is more than one way to do it, with different tradeoffs in expressivity and performance). A similar problem happens with functions: if you try to give a static type to a function with a dynamic implementation you can't just make the check at that point. You also need to insert type checks after every call, to see if the return value has the expected type (and again there is more than one way to do it...)
I assume that my downvoters do not believe this can be done in a conventional static type system. So I will try to explain further.

> The problem munificent was talking about is that for more complex types you cannot write that type checking predicate you want, which would check if a value belongs to the given type and return a variant-free version of the object.

The complexity of the type should not matter to a sound type system. Assume a "fromVariant" function that unpacks the variant, it has a type: forall a. Variant -> Maybe a. Maybe is either Nothing or Just a, depending on whether the variant does indeed contain what you expect it to contain.

> if you have a reference to an array of integers in the typed part of the code you need to protect against untyped parts of the code inserting non-integers inside the array.

This "protection" is exactly what static type systems do. "Untyped code" in the context of using a static type system, is code that works with variants. So we need to pack the array of integers into a variant, where they will be suitably tagged by the runtime. The "untyped code" is free to write whatever it wants into a variant, so we must unpack it with runtime checks. The static type system forces us to unpack and assert its contents, before we can assign a specific type to it.

> The only solution is to insert typechecks in the reads and/or writes to the array

I am not sure what you mean here. When the array is packed into a variant then it can indeed be modified into anything by "untyped" code. But it should always be tagged correctly, which is the responsibility of the runtime. It will need to be re-asserted when unpacked into a specific type.

> A similar problem happens with functions

Functions should be values like any other, so my points apply to them too.

The point I have been trying to make, is that a conventional static type system will already allow for "gradual typing". I cannot see a need to use an unsound type system in order to achieve this goal.

Dart's choice of an unsound type system doesn't actually matter for those gradual typing problems we were talking about. I think you are getting too caught up in that point.

Anyway, going back to the topic, I'll try to explain why your idea doesn't work.

> assume a fromVariant function

What I was trying to say on my other post is that you can only create such a function for primitive types (numbers, booleans, strings, etc). And perhaps also for objects, if your language has something analogoud to an instanceof operator.

But for the higher order stuff like functions and mutable references you cannot write such a fromVariant function so easily! Suppose I write down a dynamically typed implementation of the identity function and pass it to some typed code that expects an int->int function. When we pass the function to the typed side we will call fromVariant on it but the best we can do at first is check if the value we have is a function (and not an integer, etc). There is no way that fromVariant can also check the function signature, to guarantee that the dynamically typed implementation will always return an integer when given an integer.

One way to solve this problem is for fromVariant to return a wrapped version of the dynamic function that checks if the returned value is of the expected type. But these wrappers can slow down the program a lot and they also mess up with pointer equality...

> the arrays

The basic problem with the arrays is similar to the one with function. The statically typed code can't blindly trust that the array will always contain the type it expect because at any moment the untyped portion of the code could try to insert junk values into the array. You will either need to do a runtime check when the dynamic code writes to the array (which slows down the dynamic code) or you need to add runtime checks when the static code reads from the array (which makes it slower than it would be in a fully static language).

Things also get extra tricky if the array contains non primitive values (functions, other arrays, etc)

---

If you are interested I can send you links to a couple of gradual typing papers. As munificent said, there is currently lots of ongoing research about this and we all wish it were as simple as you were suggesting it would be :)

> Dart's choice of an unsound type system doesn't actually matter for those gradual typing problems we were talking about. I think you are getting too caught up in that point.

I was trying to show that "gradual typing" should be possible in any reasonable static type system, without resorting to unsoundness. I do concede that there is probably a definition of the term "gradual typing" out there that I am not strictly adhering to.

> Suppose I write down a dynamically typed implementation of the identity function and pass it to some typed code that expects an int->int function. When we pass the function to the typed side we will call fromVariant on it but the best we can do at first is check if the value we have is a function (and not an integer, etc).

If the best we can do is tag it as a function, then fromVariant would unpack it to a function with type Variant -> Variant, which represents all dynamically-typed functions. When we call this function, we indeed would need need to assert the outputs.

> One way to solve this problem is for fromVariant to return a wrapped version of the dynamic function that checks if the returned value is of the expected type

Agreed. Although the danger of this is that you have a bomb waiting to go off inside some unsuspecting piece of typed code. The alternative to to acknowledge that it is a partial function in the type, e.g. the Variant -> Variant function becomes Int -> Maybe Int, or similar. The point is that we are solving these problems using conventional static typing. You haven't yet convinced me that this doesn't work.

> You will either need to do a runtime check when the dynamic code writes to the array (which slows down the dynamic code) or you need to add runtime checks when the static code reads from the array (which makes it slower than it would be in a fully static language).

It sounds like you are concerned with performance and efficiency, that may indeed be an issue, but it's orthogonal to the type system. My own position is that if one wants performance, then don't write untyped code!

> If you are interested I can send you links to a couple of gradual typing papers. As munificent said, there is currently lots of ongoing research about this and we all wish it were as simple as you were suggesting it would be

Thanks for the offer, I am definitely interested. I suspect that the outstanding issues are around usability and efficiency. I stand by my original point that "gradual typing" should be possible in any sufficiently expressive static type system.

This is exactly right. It's a subtle, complex problem that isn't apparent until you start really digging into these kinds of type systems. It's really hard to define the right membrane around the untyped code and values as they flow through your program into typed regions while giving you the safety and performance you expect inside the typed code.
I respectfully disagree. The "right membrane" around any untyped value is a Variant type. A static type system would then stop you making any assumptions without unpacking and asserting its type. If we don't want variants and just want to defer type errors until runtime, then this is possible too (GHC Haskell can do this).
GHC's Dynamic type can only contain monomorphic values, it can't handle polymorphic stuff.

GHC also considers Dynamic->Dynamic and Int->Int to be incompatible types, which is not quite what one would want in a gradualy typed setting.

Haskell also sidesteps the difficulties that come with mutability in a gradualyly typed setting.

No, there's more to it. There's been some research in recent years on the actual benefits of types and the results are less straightforward than some people think.

An important aspect of type annotations, for example, seems to be that they help document the API of a module, regardless of whether they're statically checked. This is especially relevant with respect to maintenance tasks.

Conversely, the benefits of static type checking to get correct code initially seem to be a lot murkier. There seems to be a cost associated with static types and that is that it takes longer to write code; time that can be spent in a dynamically typed language on other validation work (remember that types do not even remotely capture all software defects).

This means that it is definitely worth exploring the benefits of optional/gradual/soft typing vs. static typing and/or runtime type checking vs. compile-time type checking.

Conversely, the assumption that static types are an unalloyed good is not currently borne out by the evidence (in part, of course, that is because the evidence is inconclusive).

There seems to be a cost associated with static types and that is that it takes longer to write code;

But this is not the only cost that matters, indeed might not even be a cost.

I've gone from being neutral about static vs. dynamic types to being pro-static types -- and the change happened when maintenance became a bigger part of my job.

Writing new code is now far less important to me than looking at a small part of a large system and trying to understand what is and is a not possible at that point. Static typing does not make this easy, but dynamic typing makes it far more difficult.

> But this is not the only cost that matters, indeed might not even be a cost.

I'm not saying otherwise. My point is that there's no objective way to proclaim one better than the other. This depends on application domain, economic constraints, engineering constraints, what you're doing, and so forth.

Writing Ada software that controls a pacemaker has totally different requirements than exploratory programming in Jupyter that mostly deals with integers, floats, and arrays and matrices thereof, for example.

> I'm not saying otherwise. My point is that there's no objective way to proclaim one better than the other.

Very true. But any analysis that emphasizes writing code over maintaining it will systematically bias itself in favor of dynamic typing.

Interestingly I have had the converse debate with some of my colleagues, who have learned to hate Python because they keep having to debug existing systems. I try to tell them that it is an excellent language for the kind of one-off data-analysis that I did when I was a scientist.

They don't believe me, because here among software engineers, seemingly innocent 400 line scripts keep growing into giant, decade old, 100kloc typeless monstrosities.

> Very true. But any analysis that emphasizes writing code over maintaining it will systematically bias itself in favor of dynamic typing.

This is not what the studies do. There is no emphasis on anything. They look at how people do on a number of different tasks with different typing options and report the results.

Also, it's not just dynamic vs. static typing. Gradual and soft typing is also of interest, because it allows you to turn dynamically typed code into statically typed code without a complete rewrite.

Type annotations are brilliant for documentation, and it does matter if they're checked or not, because otherwise they're just comments, and we all know comments become wrong over time. If the compiler doesn't enforce your type annotations you can't trust them.

The sweet spot for me at the moment is a language which is fundamentally static but has good type inference so that you don't actually have to mention types all the time.

Haskell's pretty good at this, although because it's possible to write many Haskell programs without actually mentioning a single type anywhere it can be a pain to read the results. Rust deliberately constrained type inference so that you have to have type annotations on functions, which makes sure your interfaces don't get too murky when reading the code.

I'll go with the idea in another reply that static typing really starts to show its benefit in long-lived code bases under maintenance.

Certainly a static type checker makes refactoring a lot easier, in my experience.

> Type annotations are brilliant for documentation, and it does matter if they're checked or not, because otherwise they're just comments, and we all know comments become wrong over time. If the compiler doesn't enforce your type annotations you can't trust them.

Note that I wrote statically checked. You can also check them at runtime (basically as a form of Design by Contract), which is what Dart does at the moment.

You could, but why should you spend the cycles on that at runtime when you could've done it just once in the compile phase and errored out at a useful time instead of six weeks later when you hit the one branch you neglected to write unit tests for?
Because designing a type system that is sound and expressive and simple is bloody hard. Haskell frequently has a number of `{-# LANGUAGE ... #-}` directives at the top of source files, OCaml has acquired no less than six different ways of doing runtime polymorphism: as an extreme case, the module system has morphed into a fully functional OO system in addition to the one OCaml already has.

Runtime type checks allow you to greatly simplify the difficult parts of the type system or make it more expressive.

People often forget how many expressiveness shortcuts we take so that our type systems keep working out. We have integers (which are in actually practice often just elements of a finite ring), but introduce natural numbers and suddenly a lot of static type checks stop working out. How about interactions between the natural numbers with and without zero? Integers modulo your wordsize vs. infinite precision integers? Can you statically check overflow or underflow? Dart has both `num`, `int`, and `double` types, and `int` and `double` are subtypes of `num`, but `int` is not a subtype of `double`; readAsBytesSync() returns an `Uint8List` that conforms to `List<int>`. From a mathematician's perspective, computer science type systems are usually painfully simplistic. They usually capture something about the representation of the data, not its actual mathematical properties. Their role is often is to prevent undefined behavior, not to capture the actual semantics. Computer science typing often tends to be opportunistic rather than semantically oriented, based on what's easily possible rather than what one wants to express (see the entire family of covariance issues).

As an extreme example, consider GAP's [1] type system. We're dealing with computer algebra here, and some of the types simply cannot be captured at compile-time. Any type annotations would have to be checked at runtime, other than the trivial ones (and even integers are basically a subtype of the rationals or any polynomial ring with integer coefficients).

If I want to do effective implementations of matrices and vectors over finite fields, I pack several finite field elements into a machine word and extract them; no static type system in existence has even remotely the power to statically typecheck these operations for me, unless layered over a ton of unsafe code, and then only for some languages.

From the computer science side of things (I have a background in formal methods), type systems are underpowered specification languages that just pick a small subset of a formal spec that happens to be convenient to prove. They can express only a subset of program behavior (compare what types in even (say) Haskell do to a full-blown formal specification in Z) and often not even the interesting ones. As a result, I don't particularly stress out over whether checks only occur at compile time. Runtime type checking is simply a different tradeoff in that you move from the set of types that you can build a sound type system around to predicates that can be expressed at runtime. It's simply a different set of tradeoffs.

[1] http://www.gap-system.org/

> There seems to be a cost associated with static types and that is that it takes longer to write code

Perhaps. As someone who has used Python and Haskell for roughly equal amounts of time in my professional career I definitely find Haskell faster to write in.

mypy is getting good (from haskell's perspective that would be 'barely passable', still a great improvement from what it used to be), been using it actively for the last year and advocating it for 6 months and seeing nice ROI even considering it's warts.
I don't think I've seen a Variant type in any language that doesn't store some form of type information at runtime (unless you count C void pointers, which I don't), and "any decent static type system" incurs no runtime overhead in the common case.
Of course variants would need to tag their payload, but this is what all dynamic languages need to do anyway. My point was that such dynamic behaviour can be done without resorting to an unsound "optional type system".
Isn't Dart compiled to JavaScript? As far as I can tell by playing around with DartPad, a normal int in Dart carries no runtime information indicating that it is an int (other than the fact that it is a JavaScript-level int).

If you wanted to add variants to Dart, you'd either need to add this runtime information, or limit the Dart typesystem to be no more and no less than what the JavaScript typesystem already stores dynamically.

Surely JavaScript itself tags ints?!
It tags numbers, yes, but actually it doesn't tag ints in particular - there is no integer type in javascript, just a 64-bit floating point one. (This is why asm.js code sticks |0 after every arithmetic operation involving integers: that forces rounding the number as if it were an integer, and therefore allows optimizing the expression to use integer instead of floating-point arithmetic.)

If Dart wants to distinguish ints and doubles, it needs to keep track of that on its own; if it has a JavaScript variable x and needs to know calculates x/2, it won't know whether to round it as if it were integer arithmetic, or return a floating-point value. Usually it is enough to do this at compile time (the function calculating x/2 knows whether it's halving a Dart-level double or a Dart-level int), and so there's no need to make x an object that remembers its own Dart type.

Also, even if JavaScript distinguished ints and doubles, a higher-level language is likely to want to have multiple kinds of things with an int representation: bools, enums, bitfields, Unicode scalar values, etc. Again, ideally that information is tracked at compile time and the knowledge of what to do with an int representation is inserted into the compiled code, and so no metadata needs to be stored at runtime.

Writing well-typed code means writing a proof of some set of properties over the code. There are three assumptions here:

(1) creating abstractions that are typed correctly is easy

(2) the existing abstractions you're building from are correctly typed

(3) the properties that the type system proves are valuable enough to reward the effort

Whether (1) is true depends on your type system. If it's very simple (e.g. dynamically typed), it's trivially true. If it's rich and expressive, it can sometimes be impossible to express your intent without obfuscating your types (e.g. using pointers and typecasts or polymorphism or some other loophole); and with some type systems, it's plain impossible to express some valid programs.

Whether (2) is true is a social problem. You need to have enough sensible people in your community such that you get to use List<T> rather than ObjectList with typecasts, or what have you. What's available in the runtime libraries often translates into what's commonly used for APIs in third party modules, and if the library of types isn't rich enough, APIs will use loopholes or typecasts or other mechanisms that obfuscate types. Third party code turns into a kind of firewall that stops your types (and the information, proofs, they convey) from flowing from one module to another in your program.

Whether (3) is true also depends on your type system, but also on your application domain. It's in direct opposition to (1); the more you can prove with your type system, the harder it normally is to write code that proves exactly the right things, and neither more nor less (the risk is usually too little type information flowing through). If the properties being proved are important to your application domain, then that gives a big boost to type systems that can prove those properties. If it's just because it gives you warm fuzzies, and testing is what proves interesting things because interesting properties (in your domain) are harder to prove with a type system, then rich type systems are less useful.

Personally, I was once a strong proponent of rich expressive type systems. But I eventually learned that they can become a kind of intellectual trap; programmers can spend too much time writing cleverly typed confections and less time actually solving problems. In extremis, layers of abstraction are created to massage the type system, often in such an indirect way that the value of the properties being proven are lost owing to the extra obfuscation. Now, I see much more value in a strong type system for languages like Rust, but much less value at higher levels of abstraction. That's because many of the properties being proven by Rust (like memory safety) are already proven in high-level dynamic languages through GC and lack of pointers and unsafe typecasts. Whereas application-level properties being proven through type systems are seldom valuable.

Documentation and tooling are much stronger arguments for types in high-level languages than proofs; ironically, this means that the type systems don't need to be sound, as long as they're usually OK in practice!

As a whole, you are correct. Pragmatically it allows defining the boundaries of soundness.