Optional typing like the one we see here is not uncommon in other dynamic languages: For instance, Clojure has core.typed and prismatic schema, which approach the problem in ways related to what the article shows.
However, while optional typing gives you some benefits over purely dynamic typing, the fact that it's all bolted-on causes a variety of problems. First, there's the fact that you'll have code with types interact with code without them. This eventually causes more trouble than it solves.
IMO, the biggest issue though is that what we really see from most of these systems is to add optional type systems that are comparable to very simple type systems, like Java's. But those strongly typed systems are not really that powerful! The real power of static typing doesn't come from being able to make sure we don't mix strings and integers, but in doing type checking for much more complex abstractions. Type systems like Scala's, or Haskell's. Creating an optional typing linter that looks at that high a level, and doesn't cause much of pain, is not something I've ever seen. Type inference with generics, existential types, higher kinded types, algebraic types. That's where the real value is, and where modern typed languages are.
Aiming optional typing at where typed languages were 20 years ago is not going to help bridge the gap. If anything, I think it makes it wider, because then fans of dynamic languages think they understand the position of proponents of type systems when, in fact, they are looking at a strawman from the past.
It's more helpful to think about classes of errors and what the type system prevents.
A simple type system prevents type mismatch errors. A more complex type system like Haskell's might be able to encode interfaces and other rules but might also has it's limits.
The trade-off is usually that the more classes of errors you remove, the more complex the type-system becomes. Over the lifetime of your program, how much time did you spend hunting bugs vs time spent in long compilations or encoding all the rules.
I guess my point is that a lite-weight type system might also be enough.
Type checking Haskell, for example, is actually incredibly fast and adds basically nothing to compilation time. This isn't one of the drawbacks of a type system.
For the record, I've never really had a problem with GHC having long compilation times. If anything it's remarkably fast for what it does (IMO, it's the closest thing to a "sufficiently smart compiler" currently available).
I haven't had an opportunity to actually use it, so take this with a grain of salt, but the optional/gradual type system of TypeScript looks decently expressive (http://www.typescriptlang.org/Handbook).
The main advantage of static typing to me is that it enables automatic refactorings.
Without types, it's impossible for tools to refactor your code safely without the supervision of a human. This leads to developers being afraid of refactoring and, ultimately, code bases rot and become huge piles of spaghetti that nobody wants to touch.
With a statically typed language, I'm never afraid to refactor whenever I see an opportunity to do so.
I hear this argument a lot, and I'm sure static typing is helpful when refactoring, but I have found that nothing is as important as tests when refactoring. I'd rather refactor dynamically typed code with good test coverage than statically typed code without good tests.
The point of typing is that types are tests: they ensure logical consistency of your program, and they are checked by the compiler. This has a number of consequences.
- Some error conditions are inexpressible in your program. You cannot write classical tests to verify these, as trying to recreate the erroneous conditions will result in type errors.
- You cannot forget to "write a type" like you can forget to write a test. A type always covers all the guarantees it makes. This is especially useful when there are no tests, as you have at least some kind of helper available to you.
- Types are a form of documentation. Again, even a poorly documented, poorly named project still has types.
- Types do not make tests obsolete, but they greatly reduce the number of them you have to write and maintain.
As a result of this, most Haskell projects have abysmal test coverage compared to what you'd expect in Python or Java. Nobody would (or could!) write tests that all "instanceof" checks are covered. When you forget to implement a branch in your program, you'll get a compilation warning. When you feed a value to a well-written function, you're guaranteed to get a non-nonsense result out of it (instead of an exception). But still, you can make core changes to programs you do not understand at all - while understanding the type system in general - and have your changes work as expected.
In a sufficiently strong type system, you can express constraints stronger than "returns a string" or "takes an integer". For instance, if you want to make sure that a particular function always returns a valid file descriptor, make a structure for file descriptors that just contains a single int, and give it a private constructor that can only be called by the low-level functions that call the actual syscalls, or maybe a public constructor that confirms that the provided integer is actually a file descriptor. Then, as long as your function typechecks, it can only possibly return a valid file descriptor, for all possible inputs.
If you take this to an extreme, you get the https://en.wikipedia.org/wiki/Curry-Howard_correspondence, that there's a direct mapping between mathematical propositions and types, and between their proofs and functions of the corresponding type. This is what all modern proof assistants build on top of, though their type systems are usually extremely complicated.
I completely agree. For example, dependent typing lets you create a CSV type which ensures that each row added has the same number of columns. All of this is checked AT COMPILE TIME! The best part is that it's actually easy to do. Within a day or two of looking at Idris I could figure out how to build the previously mentioned CSV type.
Really good type systems are incredibly powerful tools for testing and refactoring code.
In addition to what the others said, I'd much rather prefer good static typing (i.e. Haskell).
Especially in the prototyping phase, I do a lot of iterating and refactoring - and I mean A LOT, because I've found that this is the best way for a good design to fall out. Sometimes like 20-30% of the code get refactored, replacing core data structures and control flow.
If I ever had to rely on unit tests for this, I'd shoot myself, mainly because it would make me 1/2 times slower (because I'd have to reimplement all unit tests every time) and this would really break my flow.
Also, relying on unit tests for refactoring means that having anything short of 100% code coverage (i.e. 70-80% isn't enough) defeats the purpose of the whole thing anyway - and I've yet to see 100% test coverage in a real-world project. I realize this isn't an argument in the context of your reply (it was either test coverage or static types), but still, something to consider.
No, not imperative types. Tests run specific data through one or more functions for a variety of reasons. Each run will have some successful or flawed effect. Different values of the same type of data can have different results. So, generating test values from specs and types automatically is different from merely typing the data or functions.
And, again, I'm talking traditional types like in Java or C++ rather than dependent types and other esoteric stuff.
As a C# developer I rarely write unit tests, because of static types and compilation identifying most problems. (plus Resharper) I only use them to test business logic or complex scenarios, which is what I would argue they are meant for. Not replacing a compiler.
I thought this until I started using a good type system. Java-style type systems don't replace many tests, but have you worked in Haskell / Scala / similar?
Thanks, that's an insightful comment as on the statically typed side I have experience primarily in Java/c#/c++ etc. which is really what I had in mind when making the comparison. This makes me even more curious to explore Haskell one day.
I work on Python and the Pandas data frame is one of the core data structure that is used in the implementation. I have been thinking of how I can translate this code in a more robust type checking language such as Haskell, without loosing some of the conveniences Pandas provides. Do you have any suggestions or pointers to this?
I would love to encode the business logic into types. But, I am always afraid I might end up in type hell!
Sadly, that's a misconception about types that languages with poor type systems have caused many to have. There's nothing about pandas that shouldn't work in Haskell.
The type system only goes so far. If your refactored code calls functions that take several different parameters of the same type, you can make mistakes that the compiler won't catch.
There are workarounds for this kind of thing. Haskell uses newtypes to wrap a value with a semantically meaningful tag. Scala has a similar concept called "value types." That means you could turn something like this:
loginUser(String, String)
Into this:
loginUser(Username, Password)
These aren't just type aliases. So far as the language is concerned, Username and Password are incompatible types.
I haven't done large-scale refactoring work before. How does typing help with refactoring? Is it because you know exactly what's being passed into a function?
Often enough, in Haskell, you can boldly make the primary change you want to make, fix all of the ensuing type errors in the rest of the code, and your refactored code will run as expected. The compiler will more or less tell you what you missed.
Refactoring changes how various chunks of code interact with each other. Types usually provide security against a limited set of bad interactions. Abstract or compound types can provide security against even more bad interactions, among other things. So, having simple, compiler-checked mechanisms for enforcing good interactions helps when you make changes that can cause bad interactions between chunks of code.
Typing makes it easier to write a "find all references" function. Types distinguish whether a call to foo.bar() is calling the same bar as baz.bar(), which makes it easier to rename Foo's bar wherever it is used, but leave Baz's bar alone.
The Python PEP 0484 approach, unchecked type hints, is strange, and probably a bad idea. There are good arguments for entirely dynamic typing, or entirely static typing, or optional static typing. Those have all been used successfully in other languages. But optional static typing without checking is new. (It may have been tried in some forgotten language, but it never made it into a mainstream one.)
This is likely to create bugs, rather than fix them. The type you see looking at the code may not be the type being used there. This will confuse maintenance programmers. Worse, the compiler itself can't rely on the type info for optimization purposes. This limits optimizations in PyPy. Type checking is supposed to be performed by third party programs.
The syntax is backwards compatible, and this doesn't fit the language well. Forward references to types are handled via a really tacky kludge: "When a type hint contains names that have not been defined yet, that definition may be expressed as a string literal, to be resolved later." So, sometimes you write
def foo(item: 'Tree'):
instead of
def foo(item: Tree):
It's not stated in what scope 'Tree' is evaluated. So don't reuse type names.
Some type info is in comments:
with frobnicate() as foo: # type: int
# Here foo is an int
Also, Python is getting header files (called "stub" files), like C. In most modern languages, such as Go and Rust, that's all handled automatically by storing type info in the object files. But in future Python, that will be manual.
There's function overloading, sort of. There's "@overload", but it doesn't really do anything at run time yet.
This whole thing is a collection of hacks in search of an architecture. This is the messiest type system of any mainstream language. Well designed type systems are hard enough. This is not one of them.
If this hadn't come from Python's little tin god, it would have been laughed out of the Python community. Stop him before he kills again.
Also, Python is getting header files (called "stub" files), like C. In most modern languages, such as Go and Rust, that's all handled automatically by storing type info in the object files. But in future Python, that will be manual.
Stub files are not mandatory; type annotations can be in the same file as the code. This is perfectly legal, for example:
Stub files exist because the syntax support didn't exist in earlier versions of Python, so if you write a library that supports older versions of Python you ship a stub file rather than causing syntax errors for a subset of your users.
Well, it won't take much to have tools like Numba (http://numba.pydata.org/) taking advantage of such "type info" during optimization phase. On the other hand IDEs are already taking advance of those (see PyCharm).
What does mypy do if you make a call from code with type annotations into code without? Or vice-versa?
Has anyone ever gotten paid to add annotations to code that works?
Personally, I view type systems as like a safety line when doing work on a roof, and optional typing as having a line that might or might not be tied off.
> What does mypy do if you make a call from code with type annotations into code without? Or vice-versa?
It's not able to analyze across such boundaries, but as you expand the set of typed code you get better and better coverage.
> Has anyone ever gotten paid to add annotations to code that works?
At Dropbox, we're starting to add type annotations to code. We're confident it will catch many bugs at lint time.
Source: I work at Dropbox.
> Personally, I view type systems as like a safety line when doing work on a roof, and optional typing as having a line that might or might not be tied off.
At least it has the possibility of being tied off and catching you. Better than definitely not having a line.
Having type annotations that are ignored at runtime would cause problems when you interact with code without type annotations, no?
I seem to be doing this a lot in this thread, but...
This is your obligatory reminder that several statically-typed languages actually run on dynamic-language runtimes, because programmers don't like to think about the cost of polymorphism/generics, but implementers have to think about that.
Without the type annotations, you're expected to handle dynamic type checks yourself. With type annotations, you might think that tooling does it for you, so you might omit the dynamic checks. But the tooling doesn't necessarily check all your callers, and you end up with unexpected input.
This is partly what the Typed Racket writers mean when they say that most gradual typing systems like this one are not sound.
It can make debugging harder because "impossible" things happen - when a variable is annotated as a string you may not think to consider the possibility that it's actually an integer. But yes, optional typing is mostly just useless rather than actively harmful.
This topic has been knocked around in Ruby-land for a while; I remember seeing Michael Edgar's LASER (https://github.com/michaeledgar/laser) static analysis tool including some work around optional type annotations, but seems like development there has stopped.
InfraRuby is a statically-typed Ruby (compiles to the JVM). InfraRuby code runs on Ruby interpreters without modification: http://infraruby.com/blog/why-infraruby
Aren't type annotations in python just documentation that's designed to look like it's not? Seems like it would make more sense to just create a documentation standard than develop a brand new syntax for documentation. Another idea that makes more sense to me is to use @decorators. I really don't understand why this syntactic hack is supposed to be a good idea.
Has anyone ever tried to implement optional typing in python with just generators?
It seems like a generator like @Signature(...input types, output type) would solve this problem with limited language changes, and would work in python 2/3?
Do you mean decorators? A solution like that exists and has been used for a while. It really is only usable for run-time type-checks. See here: https://github.com/dobarkod/typedecorator
With type annotations you can have static guarantees, which decorators will not be able to provide, as decorator methods will be called and resolved only during runtime. Objects imported from `types` hopefully will not.
Since I unit-test the heck out of my code, this doesn't really do much for me. Unit-tests test actual values (which is where the interesting bugs come from IMO) and give me more powerful refactoring capabilities than an IDE.
The real benefit I'd be looking for is the chance to give the compiler hints to speed up execution times.
I don't think this is true at all. You cannot write unit tests for all plausible values sent to your functions. If you omit manual type checking in your code or in the respective unit test, you may miss some subtle failure scenarios. Undergoing a refactor, maintenance, or change from other people (or even yourself at a later point in time) only makes this more possible.
I personally find that type safety cuts down the number of unit tests I write by half and makes refactoring work an order of magnitude easier to perform.
> You cannot write unit tests for all plausible values sent to your functions.
While this is true, static type checks do even less for this problem. Do you even create specific types for value ranges eg IntegerBetweenZeroAnd100? You're in a vast minority if so, and I'd be interested to see how tedious it is to construct all these non-native types everywhere.
> If you omit manual type checking in your code or in the respective unit test, you may miss some subtle failure scenarios.
There's generally not much that's subtle about a wrong type. If the code is executed at all, it will usually blow up. In my experience, the subtlety comes in the values.
> I personally find that type safety cuts down the number of unit tests I write by half
I just don't buy this (but then again my team goes for near total coverage). Nowhere near 50% of our tests are testing anything that would be solved by static types.
Even if you ignore all the other benefits of type annotations, merely being explicit about the types of parameters your function accepts helps people reading your code.
If you're going to make the argument that you can just add comments with the types - well then that's exactly what mypy is doing, but if you do it in the mypy format you get static analysis of your types for free.
Static typing and unit testing aren't mutually exclusive.
Ugh. I have hardly any type problems in my Ruby code. I've gotten very good at recognizing implicit state and capturing it in a properly instantiated object with a well-named class. I daresay that if you don't have this skill, a type system isn't going to help you much and you're going to get nasty bugs anyway.
The problem in Ruby is nils, and you'd have the same problem with the same solution in a static language; creation of a duck type. You can't get away from duck types, whether it's a maybe type or whether you perform nil-checking at the earliest possible opportunity. You learn with time and experience how to deal with inconsistent data. Ruby gives me the flexibility to do it without a lot of boilerplate.
Yes, a type system is very much going to help you, even if you have that skill of "capturing implicit state". Because then a function you call, written by somebody else without that skill, may rely on global state in unexpected ways and still blow up in production.
A good type system doesn't allow you to call impure code from a pure function. It just won't compile. So your skill becomes useless because it's automatically verified by the compiler.
Nils are a problem in languages like C and Java, but more powerful type systems completely eradicate the issue. You might enjoy taking a look at Crystal. It has Ruby-like syntax and a type system that makes nil-checks completely unnecessary.
Nils are only a problem when you're dealing with unclean data. The only way to eradicate the issue is to only deal with clean data. Otherwise you have to code the system to be able to handle it, whether through the type system or in another way. A type system helps, but so does discipline. I'll trade the productivity gains of dynamic typing over a babysitter any day.
A type system is just OOP with added math. The benefits don't outweigh the hassle. I've tried Crystal. Did not like. It was the language that taught me that there's more to Ruby than nice syntax.
However, while optional typing gives you some benefits over purely dynamic typing, the fact that it's all bolted-on causes a variety of problems. First, there's the fact that you'll have code with types interact with code without them. This eventually causes more trouble than it solves.
IMO, the biggest issue though is that what we really see from most of these systems is to add optional type systems that are comparable to very simple type systems, like Java's. But those strongly typed systems are not really that powerful! The real power of static typing doesn't come from being able to make sure we don't mix strings and integers, but in doing type checking for much more complex abstractions. Type systems like Scala's, or Haskell's. Creating an optional typing linter that looks at that high a level, and doesn't cause much of pain, is not something I've ever seen. Type inference with generics, existential types, higher kinded types, algebraic types. That's where the real value is, and where modern typed languages are.
Aiming optional typing at where typed languages were 20 years ago is not going to help bridge the gap. If anything, I think it makes it wider, because then fans of dynamic languages think they understand the position of proponents of type systems when, in fact, they are looking at a strawman from the past.