Hacker News new | ask | show | jobs
by wyago 4070 days ago
This actually doesn't have to do with that. They "propose a redesign of ML in which modules are truly first-class values, and core and module layer are unified into one language."

They're not trying to, say, unify SML and OCaml, they're trying to solve a problem inherent to ML itself.

1 comments

Can you elaborate on why these are problems for ML?
One of the classic motivating examples (which is addressed in the 1ML paper) goes like this:

Both trees and hash tables can be used to implement map-like data structures. You could, in ML, have an abstract signature for Map types that gets implemented sometimes by a concrete TreeMap implementation and sometimes by a concrete HashMap implementation. HashMap would be a better choice if the expected number of entries is larger, and the TreeMap would be better if the expected number of entries is smaller, so what we'd like to do is write something like

    module Map = if size > threshold then HashMap else TreeMap
so that we choose what concrete implementation we want at runtime—but we can't really do that, because the language we use to talk about modules in ML is distinct from the language we use to talk about values. That is to say, the size > threshold part can't coexist in the same expression with the HashMap and TreeMap part. Some MLs have added the ability to wrap modules in values, so you can write it this in OCaml:

    module Map = (val (if size > threshold
                         then (module HashMap : MAP)
                         else (module TreeMap : MAP))) : MAP
but it's a bit awkward because of the explicit moving-back-and-forth between value-level and module-level, and the interaction between the two languages has some rough edges (which the paper explains more thoroughly, if you're interested.)

The motivation for 1ML is that we'd like to use the same language to talk about both modules and values. That way, we could write the first, simpler definition without having to worry about the fact that we're manipulating distinct 'things'. Of course, there are other tradeoffs involved in the 1ML solution, but it's an interesting, compelling experiment.

how does that end up working for typechecking? does the Map module end up having all the properties of the intersection of these elements?

The Map example seems pretty easily solvable via Haskell through typeclasses or the more standard OOP-y languages through interfaces, but am I missing something? Does SML not have the tools for this right now?

There are very much ways of expressing that computation in SML—or in Haskell or other languages—but not in the language of modules. Haskell typeclasses would have a similar problem. Consider the following pseudocode:

    class Map m where {- some implementation -}
    
    instance Map HashMap where {- ... -}
    instance Map TreeMap where {- ... -}

    mkMap :: Int -> {- ??? -}
    mkMap size | size > threshold = newHashMap
               | otherwise        = newTreeMap
What is the type of mkMap? It won't actually type-check in Haskell. It can't be (Map m) => m, because it needs to have a concrete type. We could do some kind of type-level work, relect the size variable into the type level, and have a type family that decides whether the resulting type is a HashMap or a TreeMap—but that has the same problem as the OCaml example, where we're using a different language (the language of type-level computation) to talk about something that'd be simple with value-level programming.

One place where you could write this easily is in a dynamically typed language:

    def mkMap(size):
        if size > threshold:
            return HashMap()
        else:
            return TreeMap()
That is exactly why this kind of research is interesting: we'd like to capture the flexibility of idioms that dynamic languages can sometimes afford us, with the extra safety and security guarantees of a static type system!
You don't need a dynamic language for that. In Java:

    Map<K,V> mkMap(int size) {
        return size > threshold ? new HashMap<K,V>() : new TreeMap<K,V>();
    }
Thanks for the explanation, this is illuminating

I'm still kind of unsure of why (Map m) => Int -> m isn't a valid type signature for this though..

That type means something slightly different: (Map m) => Int -> m doesn't mean that it returns some Map type, it means it returns any Map type. We could write a function of that signature, but that would imply that the caller/context would choose what type of Map it returns (and so it would not depend on the integer argument.)

Think about what it means if a type variable has no typeclass constraint: the type t isn't just a specific unknown type, it's literally any type. Similarly, C t => t isn't a specific unknown instance of a typeclass, it's any instance of that typeclass. If we tell Haskell that we're returning an instance C t => t and then try to give some concrete type, Haskell won't allow it, because it wants that type to encompass any possible type that implements C, not just the one we happened to use.

One way of making the above code work in Haskell is to use an existential type, which expresses that we're talking about some (not any) map type, but we'd have to wrap it inside a different constructor:

    -- this expresses that a SomeMap constructor contains some kind
    -- if Map, but we can't tell which one:
    data SomeMap = SomeMap (forall m. Map m => m)

    -- We can now express this by hiding the choice of Map inside the
    -- SomeMap type:
    mkMap :: Int -> SomeMap
    mkMap size | size > threshold = SomeMap newHashMap
               | otheriwse        = SomeMap newTreeMap
which is, again, a solution with some advantages and some disadvantages.
If I remember correctly, isn't this one of the problems Idris and Agda are good at solving?
We could in fact write mkMap in Idris with very little change, and give it a proper type:

    -- This might not be the best/cleanest way of writing it, as my
    -- Idris is a little rusty, but this will work:
    mkMap : (size: Int) -> (if size > threshold then HashMap else TreeMap)
    mkMap size with (size > threshold)
      | True  = newHashMap
      | False = newTreeMap
So yes, although again, other tradeoffs are made to allow this kind of radical expressiveness.
Thanks for pulling this out and explaining it; although I'm not familiar with ML, I get what the problem is here. Interesting!
Erasing that distinction enables some goodies such as first-class type constructors, which makes it possible to manipulate types of higher kind. This in turn, presumably enables things like Generalized Algebraic Data Types.
the monad example in 1ML is:

  type MONAD (m : type ⇒ type) =
  {
    return a : a → m a;
    bind a b : m a → (a → m b) → m b
  };

  map a b (m : type ⇒ type) (M : MONAD m) (f : a → b) (mx : m a) =
    M.bind a b mx (fun (x : a) ⇒ M.return b (f x)) (* : m b *)
not too bad.. though I wonder if things are defined structurally instead of nominally, which monad instance are you going to get for a given expression. Haskell's "one instance per class" rule means the compiler figure out which monad instance to apply.

This looks more flexible, but also looks like it will require more annotations and explicit parameters.

If you combine this with [modular implicits](http://www.meetup.com/NYC-OCaml/events/222026251/), which are hopefully going to be added to OCaml soon, then you'll get something which is nearly as syntactically neat as type classes but more powerful.
More power in terms of flexibility, but the mechanism of implicits makes it less clear which instance is in scope for a given expression.

With true typeclasses, there is no ambiguity which instance will be selected (there can only be one)

This distinction is often lost when comparing true type classes with their emulation via implicts.

Sometimes "more power" is not what you want i.e. this is more of a tradeoff.

The proposed OCaml implementation of modular implicits[1] considers ambiguity a compile-time error, so in that case you'd have to manually indicate which module you're passing in. (This is in contrast to Scala, in which there's an elaborate mechanism for resolving ambiguity in implicits which makes it hard to know which is being selected.)

So the OCaml implementation shouldn't make it any more difficult to discover which implicit is in use in a given context, because if an implicit is used, then it must necessarily be unique and there will be no ambiguity.

[1]: http://www.lpw25.net/ml2014.pdf

You have to explicitly pass the monad "vtable". If 1ML also grabs up modular implicits then you'll have a sensible way of passing implicit arguments not exactly unlike that of Haskell, but modularity and global canonicity are at odds so you have a more complex reasoning task.
ocaml already has generalised algebraic datatypes as of 4.0:

http://caml.inria.fr/pub/docs/manual-ocaml-400/manual021.htm...