Hacker News new | ask | show | jobs
by vog 3459 days ago
I find it interesting that CakeML, like many other developments in this area, is based on SML (Standard ML) and not OCaml (Objective Caml). Moreover, whenever I read something about ML languages, it seems most people in the academic field talk about SML.

Yet, it seems that OCaml is more popular among programmers and real-world projects. Even though these programmers come from the academic field, given the niche existence that OCaml still is. For example, the astonishing MigrageOS project chose OCaml instead of SML.

So my question is:

How is that? Why is OCaml so much popular, despite having just one implementation and no real spec? Why is SML with its real spec and multiple implementations not as least equally popular?

EDIT: Here are two possible answers that I don't think apply:

1. OCaml may be "good enough", which, combined with network effects, make choosing OCaml over SML a self-fulfilling prophecy. I don't think it is that simple, because OCaml users and projects come mostly from the academic field. They are deeply concerned with correctness of code. Which would mean they should all have favored SML over OCaml. In fact, sometimes correctness seems to be the sole motivation. For example, the author(s) of OCaml-TLS didn't just want to create yet another TLS library in a hip language. They are concerned with the state of the OpenSSL and similar libraries, and wanted to create a 100% correct, bullet-proof, alternative.

2. Although one could attribute this to the "O" in Objective Caml, I don't think it is that simple, because it seems the object-oriented extensions are almost unused, and wherever I saw them being used (e.g. LablGTK, an OCaml wrapper for the GTK UI library) I don't see that much value, and that sticking to plain OCaml Modules and Functors would have led to a better interface.)

6 comments

I work with a bunch of people who were very involved in SML, and I've asked this question many times. They usually say that SML (as a language) stagnated because of personal conflicts in the standardisation committee. A bit of a "worse is better" microcosmos, really, with a focus on on discussing theoretical perfection instead of how to get things into the hands of users. You can partially see this in play today in the Successor-ML[0] discussions, where there's not really any clear vision or leadership driving things forward, just a lot of (often very good) language suggestions.

My own extrapolation is that a multitude of SML implementations meant that programmers couldn't take advantage of implementation-specific extensions, as that would prevent their code from running elsewhere. This hindered real-world experimentation (as opposed to academic investigations) and organic growth of the language.

But, I should say that as starting point for creating new functional languages, SML is an exceedingly clean design. I'm using it myself for my own language design efforts.

[0]: https://github.com/SMLFamily/Successor-ML/issues

You can hear a bunch more about the full history of development of SML and interaction with OCaml here: https://www.youtube.com/watch?v=NVEgyJCTee4

I would definitely agree with the opinion above that the SML language designers fall in the camp of deliberating extensively and fully formalizing all new language features before considering them for inclusion.

Today, there are two challenges right now in the Successor ML space:

- There's some broad agreement on a few things that should obviously be improved, but today many of the implementation owners don't have a huge amount of time, so getting them to also agree to the new features is complicated. Life is way easier when you only have one implementation.

- There's a bunch of fantastic ideas (largely driven by Bob Harper) to integrate stuff like parallelism, cost model, etc. that will be fantastic for a next version of the language and specification. It's just a lot of work, and not really publishable, so it's mainly tenured faculty and "alumni" (like myself and probably the people you work with) driving it forward in our free time.

That said, I still love working on SML and its implementations (I mainly contribute to SML/NJ and Manticore) and it's a wonderful escape from the grim reality of quarterly goals and "get it out the door" release deadlines :-)

> many of the implementation owners don't have a huge amount of time, so getting them to also agree to the new features is complicated.

That's another thing I noticed - people are not really prototyping the new ideas. I know that Bob Harper and John Reppy are also working on other projects, so they probably can't invest much implementation time on SML anymore. I wonder how OCaml and Haskell managed to keep implementers active.

The Successor-ML feature I'm just curious about is modular implicits (or some of the other alternatives mentioned by Harper). I much prefer the SML module system to Haskell's type classes, except that it is notationally more cumbersome in use. Sprinkling a little bit of implicit behaviour (when choice of functor is unambiguous) would help a lot.

> I wonder how OCaml and Haskell managed to keep implementers active.

Members of that community could probably comment better than I, but it doesn't hurt that Microsoft Research supported two core Haskell team members and Inria has supported core OCaml work for most of the lifetimes of the projects. Their better corporate focus has also meant that as some fabulously enlightented BigCos took dependencies on them (most notably Standard Chartered for Haskell & Jane Street for OCaml), they've both hired core developers and supported the communities. I should add that these have been fantastic for the PL community in general and specifically ICFP funding & attendance!

For as much as the members of any one community might have differences of opinion with the choices of the others, success of any benefit us all. Even if we do grumble occasionally with the "can't get into ICFP this year unless you're in language {X}!" comments, where X was Scheme, then ML, then Haskell... :-)

> people are not really prototyping the new ideas

Yeah, John's group plus Matthew Fluet and a little bit of myself are trying to clean up Manticore enough to do a "final" release and journal paper, and that is kinda taking up a lot of our extra cycles. Several other folks have either retired or moved on to new areas (security, compiler verification, etc.).

I think that if we could find some places where the new ideas intersected with interesting novel (read: fundable) research topics, there'd be some additional energy into implementation work around them. Right now, certainly, stuff exists, at best, in private forks of (typically) MLton.

The old ML compilers were more deeply rooted in academia. It was almost popular to write a new one to fiddle with a new idea (like concurrency). Performance and more pragmatic considerations were a bit secondary and that included OS interfaces.

Caml had these, and thus was used to build some software somewhat popular on Unix systems (e.g. Unison, MLDonkey, a flash compiler). And while these days it might seem like we've got plenty of freely available, fast compiling, native languages to choose from, just a few years the situation was very different. So some just chose the language for that reason, never mind syntax or semantics. As long as it wasn't C.

So you just got the feeling of more dedicated, pragmatic language maintainers and actual community use. People who really care about purism went over to Haskelly anyway.

(Personally, I like the idea of languages with more than one compiler, if only the "Standard" part of "SML" would be a bit bigger)

I wonder if the Objective part of it made it more lively in the mainstream eyes too instead of "stagnant/crude" sml.
As far as I can remember, those were almost deprecated even way back when. It's never been much used for GUIs anyway, and I believe most servers and compilers were mostly written in a functional fashion.
Mystery stays intact.
I can add a few more data points. First regarding CakeML:

- CakeML is written in HOL4, which is using ML.

- SML, unlike Ocaml has a well-defined semantics and so there is a good starting point for a verified compiler.

Second, regarding Ocaml, the easiest reason for choosing Ocaml over ML is because there is only one implementation of the language that everybody is using. This makes it a lot more likely that the ocaml compiler will still be around a few decades from now. There are several modern ML compilers, most of which are more or less unmaintained. PolyML, which everybody seems to be using these days, is substantially developed by a single person.

On top of it, SML was literally designed to do proving. Writing provers, feeding algorithms into them, and later extracting algorithms out. Ocaml was designed as a practical language for software development. Unsurprisingly, the provers and verified tools tend to be done in SML or a subset of it while real-world software is done in Ocaml.

That said, the CakeML team welcomes a translator for Ocaml that outputs CakeML programs. Additionally, the Ocaml compiler was clean enough in architecture that Esterel was about to do source-to-object code validation required for its DO-178B-certified, code generator. They said they had to do way less work modifying or analyzing it than they expected. That means the Ocaml compiler itself might be a candidate for verification albeit probably a non-optimizing form of it. I'd also try the K framework that successfully handled the C semantics with KCC compiler. If it can handle C, I'd venture a guess that it should handle a better designed compiler and language. ;)

EDIT to add: Ocaml syntax is being worked on at link below.

https://github.com/CakeML/cakeml/tree/master/unverified/ocam...

> SML was literally designed to do proving.

So was Caml (and its precursor Le-ML), originally :)

That it grew so far beyond that I think is a consequence of it not having a formal definition like SML. This allows the developers to extend the language willy-nilly without going through the grueling task of extending a formal semantics. On top of that, as another commentor pointed out, is the fact that there's a single "reference implementation", which means that users don't need to worry about getting locked-in when opting to use new functionality.

That said, SML is still a fantastic base for new research projects in the PL space, mostly for the very same reasons it hasn't seen nearly as much adoption in the "practical" space as Caml: it's a small, simple, well-understood, and formally-specified language that grants researchers an excellent starting point for their inquiries and experiments.

Didnt know that about Caml. Neat. Perhaps a subset of Ocaml with formal semantics can do well, too.
It would be quite realistic and interesting to have a translation between the Scala subset that Leon project supports (leon.epfl.ch) and ML. A good starting point for this is Stainless, a redesigned version of Leon: https://github.com/epfl-lara/stainless
I can't speak for others, but from a purely practical and non-academic point of view, I chose OCaml over SML because it's painless and easy to produce a standalone binary executable. Most SML implementations seem to be able to do that too, but require me to write compiler-specific glue code, IIRC. I don't want to have to dive into compiler internals to produce an executable. Also, I appreciate that the OCaml maintainers are not afraid to provide useful syntactic sugar although it may not seem “real FP”, like the `for` loop for instance. I seldom use the OO extension, but support for virtual methods can be very handy at times. You can implement the mechanics for those with functors too, but it's just more code to write that way.
It may be similar to the reason for adoption for F# -- it does objects

Doesn't matter if you actually code with objects. The important point for adoption is that it does them.

So for a noob .NET programmer you say, hey, look at F#! You can code it just like C#. Kinda.

But as soon as they start coding, you tell them nope, all of these types of things are actually antipatterns.

I disagree with your premise on why F# was adopted. It isn't because it does objects but it was really the first functional language that was pushed by Microsoft and ran on the .NET platform. Had F# not run on the CLR I don't think it would have nearly as many users as it does now.
That goes into why Microsoft (Research) chose to base F# on OCaml rather than SML (or Haskell, which Microsoft Research also has some fingers in that pie): it's not that useful targeting the CLR if you can't interact with other things running on the CLR (including the Base Class Library [BCL]). Presumably Microsoft could have have tried for something wild like trying to build a "CLR Object monad" for Haskell, but the F# team instead went with the existing, known commodity language family that already straddled that border between object systems and functional programming (OCaml).
I think people wanting to use C# will stick to it, not learning another language to use it the same way as the one they already know.

The real value of objects in F# lies in the CLR compatibility. Access to the BCL and some existing code is a big advantage over OCaml for .neteers.

We use OCaml for the virt tools because it (a) generates efficient, standalone binaries and (b) lets us call out to C libraries really easily. We use OCaml over (say) C or C++ because it allows us to write code with far fewer bugs and headaches. You can find one of these large, real world OCaml programs here: https://github.com/libguestfs/libguestfs/tree/master/v2v

I think network effects do matter much more than you say. There is virtually no community around SML.