Hacker News new | ask | show | jobs
by coldcode 3490 days ago
Having never tried OCaml, what advantages does it have over other languages? Two languages I don't know seem more interesting, Clojure and Rust. Generally I work in Swift these days.
5 comments

IMO, the single most interesting thing about OCaml is its object model. On one hand, you have a fairly conventional setup with classes, methods, inheritance, generics etc. On the other hand, the entire system is structurally typed, with pervasive type inference. Recycling bits from my earlier comment on the subject...

    # let f obj x y = (obj#m x)#n y;;
    val f : < m : 'a -> < n : 'b -> 'c; .. >; .. > -> 'a -> 'b -> 'c = <fun>
Here I defined a function taking 3 arguments, called method m on obj, passing x as argument, then called method n on the result of that, passing y as argument.

Note the inferred type signature for this: the first argument is of type <m : 'a -> <n : 'b -> 'c; ..>; ..> - i.e. any object that has a method named m (and possibly some other unspecified members - that's what ".." means), with said method m having a signature that allows us to pass some 'a, and returning another object of type <n : 'b -> 'c; ..> - i.e. something with a method named n that accepts 'b and returns 'c. 'a and 'b' are then used as types of x and y, respectively, and 'c is the the result of f.

This is actually row polymorphism rather than structural subtyping. The difference is rather small, but important:

Under structural subtyping (or any kind of subtyping, for that matter), the subtyping relation admits a rule called "subsumption", which can lead to information loss. Consider this behavior, under structural subtyping:

    fun f {x : real, y : real} = {x = 2 * x, y = 2 * y};
    (* f : {x : real, y : real} -> {x : real, y : real} *)

    f {x = 2.0, y = 3.0, z = 4.0};
    (* it = {x = 4.0, y = 6.0}; *)
Above, when `f` is applied to a subrecord with an additional `z : real` field, the result is missing the extra field! This is because the subrecord was subsumed by the (super-) record type taken by `f`, leading to information loss. This is the reason why you often see people using up-casts in languages like Java (especially before generics were introduced) and Go (which is lacking generics for some befuddling reason): up-casts allow you to circumvent the type system and force a particular value to take a particular type, thus recovering the lost information, but can be unsafe.

Under row polymorphism, however, you can achieve basically the same effect, but you get to save any unused/irrelevant type information inside of a row variable (the `..` in your example) and include it in the result. Then, no information is lost, and no safety is compromised:

    fun f {x : real, y : real, R} = {x = 2 * x, y = 2 * y, R};
    (* f : {x : real, y : real, _ : ..a} -> {x : real, y : real, _ : ..a} *)

    f {x = 2.0, y = 3.0, z = 4.0};
    (* it = {x = 4.0, y = 6.0, z = 4.0}; *)
Users of languages like Lua or Python might be reminded of "duck typing", and not without reason: row polymorphism enables many of the same programming patterns/techniques/abstractions in typed languages that are enabled by duck "typing" in untyped ones. It's a shame that more typed languages don't incorporate the idea, because it's a really wonderful one, and it can lighten the relative "burden" of writing typed code in some cases :)

The folks at INRIA has come up with some really neat stuff, and, if you haven't already, it's definitely worth it to check out their publications, too!

Thank you for the clarification!

And, indeed, "static duck typing" is exactly what I'm thinking about when I'm dealing with OCaml objects.

FWIW, while it was relatively obscure, TypeScript is currently popularizing this concept (albeit with a lot less type inference). Although they also refer to it as "structural subtyping":

https://www.typescriptlang.org/docs/handbook/type-compatibil...

But if I understand correctly, they're basically doing the same thing that's going on here - the only difference is that the ".." is implicit in any TS interface type. Or am I wrong?

> Thank you for the clarification!

No problem! The difference between the two was something I personally struggled with at first, so now I feel like I've got a pretty good grasp on it. Funny how that works ;)

I don't see anything here in the TypeScript docs that looks like row polymorphism. Without a row variable somewhere in the type (even if it's inferred), I'm not sure it could be considered row polymorphism. However, they could be doing something fancy with their notion of type compatibility that avoids information loss without introducing row variables (which, I think, would similarly imply a lack of subsumption[0], and might not be sound, but the docs admit the system is unsound in places). It's also possible that every record type in TS carries an implicit row variable.

The important part is the (lack of) information loss, and good way to tell would be to construct a test case like the one I wrote above and see if you wind up losing your `z` field (information loss => subsumption => "by-the-book" subtyping).

This all reminds me that I've been meaning to spend some time with TypeScript :)

[0]: I've known type theorists who claim that without subsumption, there is no subtyping. I disagree because I think the fundamental/important part of subtyping is the subtyping relation itself.

I actually didn't fully understand your example originally (I missed the part where ".." now has a name!). I grok it now, and yeah, TS doesn't have it - there's simply no type-safe way to write a function like that (of course, TS being a strict superset of JS, you can still write it, it just won't typecheck).

It appears that this was discussed in the context of supporting the ES7 "spread" operator, since row variable is the obvious typed counterpart to that:

https://github.com/Microsoft/TypeScript/issues/2103

but they ended up implementing just the operator, without a way to reflect it in a function signature:

https://github.com/Microsoft/TypeScript/pull/11150

However: "I expect a spread type or something similar to be available in future versions of TypeScript."

Ah! Good to know :) TypeScript seems like a neat language, and I've got a JavaScript project coming up so I intend to look into it more deeply soon.

Yeah, in my example, `R` is the row variable itself, and it represents all those other, irrelevant values in the record, while I gave it the name `_` in the type itself, since rows themselves don't have field labels, but rather represent sets of labels. In ML, the labels of the fields are part of the type, but the labels of the rows are not part of the type. Then, `..a` is a type variable representing the type of the row (it gets a distinct name because you can imagine a function which takes two separate records with two separate rows[0], or even a function which takes two records but constrains them to the same type with a single row[1]), and is akin to a type variable used for parametric polymorphism, such as `'a`, and, in Standard ML with eqtypes, `''a`.

To the best of my knowledge, the term "row" comes from The Definition of Standard ML, where the grammar given for the core language includes productions called "TyRow", "PatRow", and "ExpRow", which correspond to the syntax of record types, patterns, and expressions, respectively, but only the part in between the `{` and `}`:

    TyRow  <- Label `:` Ty [ `,` TyRow ]
    PatRow <- Label `=` AtPat [ `,` PatRow ]
    ExpRow <- Label `=` AtExp [ `,` ExpRow ]
(The `At` prefix means "atomic", and has a specific meaning in the definition.)

So, then, a row variable is one which literally holds a sequence of label×type or label×value pairs. It sometimes even feels like it's a metavariable which holds a branch of the syntax tree the way a variable in a macro does, which might be another reason why row polymorphism feels so cool ;)

Sorry if I'm boring you at this point, I just find this stuff really fascinating and fun :)

———

[0]: Any two records of any two record types, as long as they each have both an `x : real` and a `y : real`, with intentional information loss:

    f : {x : real, y : real, ..a} -> {x : real, y : real, ..b} -> {x : real, y : real}

    (* f {x = 1.0, y = 1.2, z = 1.3} {x = 1.0, y = 2.0, w = 42}; *)
    (* ^ typechecks okay!
     * row type `..a` can be different from row type `..b`,
     * no problem
     *)
[1]: Any two records which both have the same type, which can be any record type that at least has an `x : real` and a `y : real`:

    f : {x : real, y : real, ..a} -> {x : real, y : real, ..a} -> {x : real, y : real, ..a}

    (* f {x = 1.0, y = 1.2, z = 1.3} {x = 1.0, y = 2.0, z = 0.0}; *)
    (* ^ typechecks okay! *)

    (* f {x = 1.0, y = 1.2, z = 1.3} {x = 1.0, y = 2.0, w = 42}; *)
    (* ^ type error!
     * the second record has a different type than the first,
     * but the type of `f` demands that they be the same type,
     * since the single row `..a` must match itself
     *)
I would say that the most interesting feature (for me) is the module system (although with structural typing as a close second) :)
Over other languages in general? That's hard to say about any language :)

OCaml is a language in the ML family, is backed by INRIA and is the main language of Jane Street Capital (so it definitely has "real world" use!), which also contribute a lot to it. OCaml is pretty good for functional programming, and unlike Clojure, it's statically typed. If you like static typing -- which I personally think is the way to go -- that's a huge reason to prefer it over Clojure. You may also check out Haskell if you go that way.

An interesting case study about going to OCaml from another language (Python, in this case) was posted to HN many times already: http://roscidus.com/blog/blog/2014/06/06/python-to-ocaml-ret.... I recommend you read it if you want to learn more about why would one choose this language.

Why not choose a language in the ML family? Well, if not having C-like syntax is a deal breaker for you, then OCaml is going to frustrate you. A real programmer shouldn't let syntax stop them from learning new languages, though :)

Ultimately one has to try doing something with this kind of language, even if it's just a toy project. Just reading about it won't enlighten you much, and you will end up focusing on its syntax instead and deciding it looks too unfamiliar.

> Well, if not having C-like syntax is a deal breaker for you, then OCaml is going to frustrate you.

There is a new alternative Reason syntax (https://facebook.github.io/reason/mlCompared.html) which replaces some of the unusual syntax with braces and semicolons. Though it still probably looks less C-like than Swift. And I haven't tried writing anything in it, so can't say for sure if it really feels less weird than standard syntax on a "real program".

Oh, and the other neat thing is that OCaml cleanly separates subtyping and inheritance. In other words, in OCaml, class inheritance is for implementation reuse only, and is completely orthogonal to subtyping. In many common cases (similar to Java etc) inheritance does produce a subtyping relationship in practice; but because of this decoupling, you don't get one shoehorned onto your classes when you just want the other.
Ocaml is concise yet comparatively simple (despite various warts), offers good javascript interop and (quickly!) compiles to fast code and has a good type system. It's extensively used for language prototyping and tooling (e.g. at facebook). Rust's compiler was originally written in Ocaml: https://github.com/rust-lang/rust/tree/ef75860a0a72f79f97216....

Compared to Rust it compiles much faster, offers an interactive shell and is more expressive and mature, but doesn't offer similar low-level and parallelism capabilities and eco-system coherence (e.g. there are half a dozen competing build systems). I also believe that Rust, with it's clearly defined niche will win a much bigger total market share.

Compared to Clojure it offers static typing, much better startup times and native code compilation, but doesn't have similarly modern functional datatypes and the standard library is less well thought out.

There are a couple of other things which are rare or nonexistent in other languages:

- Polymorphic variants, the article linked downthread has some use cases

- extensible variants

- Functors (see https://realworldocaml.org/v1/en/html/functors.html)

- a structurally typed OO system, as someone has already mentioned

It also has a few interesting syntactic ideas that I wish more languages would steal:

- Nice sugar for passing keyword arguments: `f ~somearg` is the same as `f ~some_arg: some_arg`. I wish I could do `f(=somearg, y)` instead of `f(somearg=somearg, y)` in python

- It supports new user-defined operators in a way that doesn't render your code unreadable, because the precedence is derived from existing operators.

- A nice syntax for "importing" from a namespace for an expression. Assuming Mod defines `foo` and you can write Mod.(foo x (bar f x)) instead of `Mod.foo x (Mod.bar f x)`

Honestly for me, it's just that it's damn useful. It's not especially new and shiny and hip, but it's easy to think in and easy to reason about. OCaml is a nice tool for writing software. That's about it. It's pretty simple functional programming that can switch to imperative and object-oriented programming where appropriate.