Hacker News new | ask | show | jobs
by ptype 3803 days ago
I hear this argument a lot, and I'm sure static typing is helpful when refactoring, but I have found that nothing is as important as tests when refactoring. I'd rather refactor dynamically typed code with good test coverage than statically typed code without good tests.
7 comments

The point of typing is that types are tests: they ensure logical consistency of your program, and they are checked by the compiler. This has a number of consequences.

- Some error conditions are inexpressible in your program. You cannot write classical tests to verify these, as trying to recreate the erroneous conditions will result in type errors.

- You cannot forget to "write a type" like you can forget to write a test. A type always covers all the guarantees it makes. This is especially useful when there are no tests, as you have at least some kind of helper available to you.

- Types are a form of documentation. Again, even a poorly documented, poorly named project still has types.

- Types do not make tests obsolete, but they greatly reduce the number of them you have to write and maintain.

As a result of this, most Haskell projects have abysmal test coverage compared to what you'd expect in Python or Java. Nobody would (or could!) write tests that all "instanceof" checks are covered. When you forget to implement a branch in your program, you'll get a compilation warning. When you feed a value to a well-written function, you're guaranteed to get a non-nonsense result out of it (instead of an exception). But still, you can make core changes to programs you do not understand at all - while understanding the type system in general - and have your changes work as expected.

Types are tests.

In a sufficiently strong type system, you can express constraints stronger than "returns a string" or "takes an integer". For instance, if you want to make sure that a particular function always returns a valid file descriptor, make a structure for file descriptors that just contains a single int, and give it a private constructor that can only be called by the low-level functions that call the actual syscalls, or maybe a public constructor that confirms that the provided integer is actually a file descriptor. Then, as long as your function typechecks, it can only possibly return a valid file descriptor, for all possible inputs.

If you take this to an extreme, you get the https://en.wikipedia.org/wiki/Curry-Howard_correspondence, that there's a direct mapping between mathematical propositions and types, and between their proofs and functions of the corresponding type. This is what all modern proof assistants build on top of, though their type systems are usually extremely complicated.

I completely agree. For example, dependent typing lets you create a CSV type which ensures that each row added has the same number of columns. All of this is checked AT COMPILE TIME! The best part is that it's actually easy to do. Within a day or two of looking at Idris I could figure out how to build the previously mentioned CSV type.

Really good type systems are incredibly powerful tools for testing and refactoring code.

Moreover, types are tests you don't have to write and maintain.
In addition to what the others said, I'd much rather prefer good static typing (i.e. Haskell).

Especially in the prototyping phase, I do a lot of iterating and refactoring - and I mean A LOT, because I've found that this is the best way for a good design to fall out. Sometimes like 20-30% of the code get refactored, replacing core data structures and control flow.

If I ever had to rely on unit tests for this, I'd shoot myself, mainly because it would make me 1/2 times slower (because I'd have to reimplement all unit tests every time) and this would really break my flow.

Also, relying on unit tests for refactoring means that having anything short of 100% code coverage (i.e. 70-80% isn't enough) defeats the purpose of the whole thing anyway - and I've yet to see 100% test coverage in a real-world project. I realize this isn't an argument in the context of your reply (it was either test coverage or static types), but still, something to consider.

What about tech that generated tests automatically from code and annotations? How do you think that would affect your work?
Well, this is basically what types are :)
No, not imperative types. Tests run specific data through one or more functions for a variety of reasons. Each run will have some successful or flawed effect. Different values of the same type of data can have different results. So, generating test values from specs and types automatically is different from merely typing the data or functions.

And, again, I'm talking traditional types like in Java or C++ rather than dependent types and other esoteric stuff.

Oh, yeah - something like QuickCheck [0] is great! There are Java and C++ implementations as well, though I haven't tried them. Definitely a nice alternative to traditional unit testing whenever possible. I don't know how well it mixes with unrestricted IO like in Java etc though.

[0] https://en.wikipedia.org/wiki/QuickCheck

Now you're getting the idea. QuickCheck is a good example. I was going to drop an example from safety-critical industry but there's too many now for me to find it lol. Anyway, thanks to our discussion, I did find this great page with links to surveys, list of strategies, and listings of tools:

http://mit.bme.hu/~micskeiz/pages/code_based_test_generation...

Enjoy!

As a C# developer I rarely write unit tests, because of static types and compilation identifying most problems. (plus Resharper) I only use them to test business logic or complex scenarios, which is what I would argue they are meant for. Not replacing a compiler.
I thought this until I started using a good type system. Java-style type systems don't replace many tests, but have you worked in Haskell / Scala / similar?
Thanks, that's an insightful comment as on the statically typed side I have experience primarily in Java/c#/c++ etc. which is really what I had in mind when making the comparison. This makes me even more curious to explore Haskell one day.
It is better with certain languages, and it is better when you encode business logic into your type system.
I work on Python and the Pandas data frame is one of the core data structure that is used in the implementation. I have been thinking of how I can translate this code in a more robust type checking language such as Haskell, without loosing some of the conveniences Pandas provides. Do you have any suggestions or pointers to this?

I would love to encode the business logic into types. But, I am always afraid I might end up in type hell!

Sadly, that's a misconception about types that languages with poor type systems have caused many to have. There's nothing about pandas that shouldn't work in Haskell.

Edit: In fact, there's this library! https://github.com/acowley/Frames

> but I have found that nothing is as important as tests when refactoring

You don't need those with static types. You only need to write tests for your actual code, not things that the compiler can check for you.

Manually writing code to check things the computer can test for you is a waste of developers time.

The type system only goes so far. If your refactored code calls functions that take several different parameters of the same type, you can make mistakes that the compiler won't catch.
In some sense, taking several parameters of the same type can be a code smell.

Foo(str, str, str) strikes me as somewhat suspect. I realize this isn't always avoidable, but a lot of the time it is.

There are workarounds for this kind of thing. Haskell uses newtypes to wrap a value with a semantically meaningful tag. Scala has a similar concept called "value types." That means you could turn something like this:

  loginUser(String, String)
Into this:

  loginUser(Username, Password)
These aren't just type aliases. So far as the language is concerned, Username and Password are incompatible types.