Hacker News new | ask | show | jobs
by bojo 3817 days ago

    app : Vect n a -> Vect m a -> Vect (n + m) a
That is pretty amazing if you ask me. I look forward to the day when we all use languages which save programmers from themselves.
5 comments

As a Python programmer who doesn't understand this notation - what am I looking at, and what's amazing about it?
The "Vector m a" means that it's a vector (like a Python list) that can only be m elements long, and those elements can only be of type a. E.g. "Vector 3 Int" will always be a Vector with three integers in it. If you try to treat it like a vector of any other length it will refuse to compile (e.g. pass it to a function that requires that it have 4 or more elements).

The whole line is the type declaration for a function (app) that takes a Vector of length m, and a Vector of length n, and produces a Vector of length (m+n). Which is to say, the compiler knows at compile time what length the resulting Vector will be, and will fail to compile if the function you've written doesn't - provably - always meet that requirement.

From a Haskeller's perspective it's amazing because container types are famous loopholes for code that produces runtime errors. For instance, if you call the "head" function (returns the first element) on an empty list it will halt the program at run time with an error, despite the fact that it passed the type check (because an empty list and a full one have the same type).

As someone who learned Haskell and subsequently have been writing a lot of Python, I keep a mental tally of how many of my bugs (some of which took ages to track down) would have been caught immediately by a type system like Haskell or Idris'. I'd say it's well over half.

In Haskell those kind of errors are drastically reduced, but a few still slip through. Languages like Idris can catch even more of them, in a clever way, which is awesome.

You have two parameters - a vector of size "m", and a vector of size "n". The function signature then expects to get back a vector of size "m + n".

Contrast that with a language with method signatures that can only return types like "Vector" or "List", without any information that is dependent on the incoming parameters.

So then, the logic is checked - if the method returns a vector that is anything other than the size of the two incoming parameter vector sizes added together, it won't compile.

In other words, even more behavior that would normally be a runtime bug is checked at compile time, which is a good thing if you are working in a domain where correctness is important and want to avoid runtime bugs.

One thing that no one touched on:

In reading Haskell (and apparently Idris) types, a name that starts with a lower case letter is an unbound type variable - sort of like a template parameter in C++. The `a` in each of the Vect's must be the same type but it can be any type (picked at the call site, for any given call).

Basically the length of the vector is encoded at the type level. Lets you do all sorts of cool stuff.

You can have number systems modulo n, which are only compatible with other numbers modulo the same n because they are distinct types.

It's just another level of abstraction that pretty much nothing else has.

    app : Vect n a -> Vect m a -> Vect (n + m) a
is a type signature for a function, `app`, which should append a vector (`Vect`) of length `n` to a vector of length `m`.

What's cool about Idris and other dependently-typed languages is that the length of the resulting vector, `n + m`, can be tracked in the type. This can prevent a whole slew of errors.

I'm not an expert in Idris, but I will try to answer this as best as I can.

That's the Haskell type notation (Idris and Haskell are similar). Here's an example:

  plus5 :: Int -> Int
  plus5 n = n + 5
The first line is the type declaration of the `plus5` function. You can read it as "plus5 takes an Int as argument and returns another Int". The second line is the function declaration. The equivalent in Python would be:

  def plus5(n):
      return n + 5
The syntax is the same when functions have more arguments:

  add :: Int -> Int -> Int
  add a b = a + b
This means that add takes two integers as arguments and returns another integer. This syntax might look strange (you could be asking yourself "How do I distinguish between the argument and return types?") but that's because in Haskell functions are curried[1].

Haskell (and Idris as well) also has parametric polymorphism, so you can define functions which operate on generic types. For example, the identity function:

  id :: a -> a
  id x = x
You can read this as "`id` is a function that takes an argument of type a and returns another argument of type a (so the return type and argument type must be the same)". This means the id function will work for any type of argument (String, Float, Int, etc.). Note that of course this is not dynamic typing, Haskell and Idris are statically typed and therefore all the types are checked at compile-time.

Now, let's move on to Idris. One of the reasons there's a lot of excitement over Idris is because it has dependent types, i.e., you can declare types that depend on values and these are checked at compile-time.

The example GP gave was this:

  app : Vect n a -> Vect m a -> Vect (n + m) a
`app` here stands for append. This is the function that appends two vectors (lists of static length) together. In Idris, `Vect x y` is a vector of size x and type y, so

   Vect 5 Int
is a vector of 5 integers. So, essentially, `app` is a function that takes two vectors as arguments, `Vect n a` and `Vect m a` (this means they can be of different sizes but their content must be of the same type). It returns a `Vect (n + m) a`.

Think about it for a minute. That is amazing! We can be sure that, at compile-time, just by looking at the function signature, our append function, given two vectors of size n and m, will _always_ return a vector of size (n + m).

If we accidentally implement a function that does adhere to the type specification (for example, it appends the first argument to itself, returning a Vect of size (n + n) instead of (n + m)), the code will not compile and the compiler will tell us where is the bug.

[1]: http://learnyouahaskell.com/higher-order-functions#curried-f...

A length n vector of a's, appended to by a length m vector of a's, results in a length (n + m) vector of a's. This is a safety guarantee that is statically checked at compile time rather than dynamically checked at run time.
You're gonna have to read the article to find out.
As a C++ programmer, I'd like to ask how is this different in effect from

    template<typename T, std::size_t X, std::size_t Y>
    std::array<T, X + Y> app(std::array<T, X>, std::array<T, Y>);
Because with dependent typing, the length can be specified at runtime and you still get static checking.

There's also a huge difference between untyped templates and strongly typed generics, but in this particular case the difference is somewhat subtle.

How does it work exactly? Both languages check things at compile time, but of course in c++ the length has to be known statically. Does idris generate code at runtime for the right size?
Think about generics in any language, say Java Stack<String>. The type Stack is parametrized by the type String. What would you do in a language without generic types? To achieve the same effect, you'd need to create a custom StackString type, which itself is trivial. But then, you'd need StackInteger, StackBoolean, StackSomeoneElsesCustomType, etc.

Language with a type system but without generics have this above downside. Usually you'd just give up making a type for each of these, and simply use the Stack type.

Language without Idris-like dependent types suffer the same downside: instead of creating Stack<String, n>, you're forced to create Stack<String, One>, Stack<String, Two>, Stack<String, Three>, where One/Two/Three are unique types you created to conceptually represent a number. If you don't even have generics, you'd need to create StackStringOne, StackStringTwo, ...

Dependently typed languages encode numbers (among others) as unique types, to simplify. Imagine this:

  put: Stack<String, Three> -> String -> Stack<String, Four>
This isn't too hard to imagine is it? No runtime checks needed. Once you get the output you can carry it around, along with its type, to places that accept Stack<String, Four>.

So, asking "how come you don't have to generate these checks at runtime?" is similar to a programmer who's never seen generics asking "how come you can parametrize? Don't you have to generate all the StackString and StackInteger, etc.". It just works. Hope that analogy was clear =).

Dependent types in general blur the line between values and types, as you've seen above. Think of the usual values you use, and imagine having a new type to represent each one. Dependent types let you do this in a sane and logical way.

Thank you, I understand that, but I need a little more detail. Let me explain.

Languages that have generics can either erase the generic argument at runtime (like Java or Scala do) or generate a different concrete type for each instantiation (like C++ or Nim do). In either case, you usually know at compile time which concrete instances of the generic parameter are going to be used where.

That is, even in Java you may have a single class Stack<A>, but at the final use site this will eventually turn into some concrete type, such as Stack<String> (it may remain generic in intermediate steps). In other words, you are not going to use Stack with a generic parameter that you did not anticipate. You cannot read the runtime type of some instance, call it X, and then start writing functions that work on Stack<X>.

Now, let me turn to dependent types. A lot of the techniques that Idris promotes can be simulated in other languages, such as Scala, C++ or Nim. But in each case that I know of, you are required to know at compile time which concrete values you are going to use as generic parameters. That is, in C++ you can write functions that operate on types like Vector<A, N>, where N is the length, but you have to eventually specialize N to a concrete number before compiling.

The comment above states that Idris does not have this restriction, but it is not clear to me how that might work. I concede that by using an erasing technique (as opposed to specialization) one is not required to materialize such functions for all N - this is what one does for instance in Scala.

But the problem is: what do you do at the usage site? Say you are reading two lists off two JSON files, and you want to concatenate them using the above function. Wouldn't you need to provide at compile time a concrete value for N and M? (and consequently cast the unsized vectors to sized ones?).

If this is the case, then there is not much difference from letting the compiler materialize a function for concrete values of M and N that are used (because in either case, you have to encode knowledge about concrete values of M and N at compile time).

Maybe everything would be more concrete if I saw an Idris program that reads two vectors from external files at runtime, and then concatenates them, whatever their size is.

I believe the Idris runtime uses erasure rather than code generation[1]. The key here is that Idris will make call sites prove statically that the conditions hold, either through having static conditions of their own or proving that the programmer has checked (at runtime) that the conditions hold.

(This case of append-n-to-m-vector is a bit weird since it basically just amounts to calculating the length of the output, but my previous statement applies a bit more generally.)

[1] So one plausible representation for Vec n a would be a size + array of pointers to a's. I believe they're actually using linked list for Vec at the moment, though.

Yes, but my question is: even if you use erasure, what are you going to do at the call site? Won't you need to state a concrete size there? For usual generic types, erasure and code generation are equivalent in what they can do, because at some point you will specialize to concrete types.

I would be curious to see an Idris program that reads two vectors out of some file (without knowing the size a priori) and concatenates them

> Yes, but my question is: even if you use erasure, what are you going to do at the call site? Won't you need to state a concrete size there?

No, Idris doesn't actually care what "n" and "m" are (concretely)... it just knows that if you happen to have a (Vec n a) and (Vec m a) lying arond, then you'll get a (Vec (n+m) a) out.

> For usual generic types, erasure and code generation are equivalent in what they can do, because at some point you will specialize to concrete types.

I'm not sure I fully understand what you mean, but I think this is slightly wrong... You can think of the fully erased (Vec n a) representation as (n, void-star) in C-style or Array<Object> in JVM style. Note that "a" doesn't actually appear. So the type is actually never needed at runtime -- all functions which have (Vec n a) as a parameter will just assume that it has been (statically) proven[1] that the pointed-to-array has the correct size. Because of parametricity, functions will not be allowed to actually access any 'a' values directly. So it all works out.

Here's another comment of mine that goes a bit deeper into the interplay between static proof and runtime checks: https://news.ycombinator.com/item?id=10302863

Key being: You either need to prove statically or you need to prove statically that you've checked at runtime.

HTH.

[1] This is valid because call sites must provide such a proof (statically).

I haven't coded in C++ for a very long time, but won't any implementation of that function, and additional templated functions which call it, only get checked at template expansion time? As I understand, buggy templated functions and classes can go unnoticed in a codebase or library simply because no-one has instantiated them in a way which exposes the bug. This is definitely not the case for Idris.

My understanding was that concepts began as an attempt to fix this "problem".

My recollection of C++ (possibly outdated since 11 or 14, though that would - pleasantly - surprise me) is that template parameters must be picked at compile time. With Idris, you can pick X and Y at runtime and still get compile time checking.
operationally, i'm not sure if there is. it's certainly a lot prettier, though. i'd be curious if there's a deeper difference, too. it feels like there is, at least to me...
Actually it's not that easy. Everywhere you have a type you need to proof that the expression really has the right type, and it's not just type annotation, it's a real mathematical proof.

To understand the complexity of the task, try proving that insertion sort is really a sort algorithm. You write more code, and writing it will take more time than you would do in a 'normal' typed language.

You can choose how rigorous you want your proof to be.

I'd highly recommend reading Chapter 1, Section 1.3 from Type-Driven Development. The first chapter is free.

https://www.manning.com/books/type-driven-development-with-i...

Unfortunately, the usefulness is limited – the size of all Vects must be known at compile time. While this does prevent a large class of bugs that arise from incorrect compile-time knowledge, the size of the class of runtime bugs affected by this is a lot smaller, as for each differently-lengthed Vect, a distinct instantiation of the Vect type needs to exist at compile-time.
Per my understanding, what's so exciting about this is precisely that you're wrong - it does not need to be known at compile time, but can prove symbolically that the resulting length will be X + Y. And of course, given that, it's not statically creating a distinct instantiation of the Vect type for every N at compile time!
Really? Color me corrected, then :) Does the compiler need a SAT solver then to make sure all the constraints hold?
I'm not sure the exact approach, but there's definitely some heavy lifting. My understanding is that most (all?) dependently typed languages are essentially theorem provers at heart.
It should be able to use a Hindley-Milner algorithm.
As I understand it, type inference breaks down on dependent types.
I've never actually used such a feature, but my understanding is that this is not strictly true. I think that the compiler can generically understand that this function adds two vectors and their lengths.

I'll see if I can contrive an example. Supposing you have a list L1 of vectors of length 5. You also have a vector V1 with user input, with length x (unknown at compile time). You then make a new list L2 by taking each vector in L1 and append V1 to it. L2 is now a list of vectors of length (5 + x). Even though x is unknown at compile time, the compiler still knows that all vectors in L2 are of the same length. It can make restrictions based on this fact.

It seemed strange to me when I heard this concept, I thought the compiler would need to consider infinite possibilities, but apparently similar things are possible even in Haskell. For instance, you can define a list as a recursive type that holds a value of a Null type (not just a null value of the list type) or another List. Apparently it can still reason about this.

However, these are things I've heard. Hopefully someone with more experience can chime in and let me know if I'm right here.

Not quite. Types can depend on runtime values. This can in effect _force_ the programmer to perform the required validation when accepting foreign data.
Not only that but you can arbitrarily encode constraints. Want to check something silly like whether there's an even number of elements in a list? Yea you can do that, and pass it along. I can do stuff like, okay, I know the max size of the files we're going to accept is 1mb.
Almost arbitrarily - they must be computable... ;)
From what I know, Idris does not have this constraint: you are free to introduce nontermination wherever.
You're correct that Idris doesn't force termination. That doesn't let you compute non-computable functions though (e.g. the busy beaver function).
Working in Haskell, I don't feel like it's "saving me from myself" so much as giving me tools I can use to save myself.