Hacker News new | ask | show | jobs
by nv-vn 3356 days ago
The problem with this article is that it's missing an answer to why one would choose ML/OCaml over Haskell. Haskell has many more features, a more advanced type system, arguably superior syntax, and much better library support. However, I believe that OCaml/SML are often a better choice for a number of reasons.

First of all, OCaml/SML are the best choice in terms of example code for compilers. They're historically the choice of many compiler/interpreter/type theory texts (Types and Programming Languages, Modern Compiler Implementation in ML, and an ML is even used as a language to interpret in Essentials of Programming Languages). Andrej Bauer's PLZOO is also written in OCaml. Equally important is the fact that there are a variety of ML implementations, all of which are much more approachable than GHC. The OCaml compiler's codebase is a reasonable size that an individual could get a good idea of how it works in a few weeks or so. SMLNJ, MLKit, MLton, CakeML are all open source and on Github, and all seem to be fairly approachable in comparison to the monolith that is GHC. And that's not even mentioning other compiler in ML (Haxe, Rust's bootstrap compiler, Facebook's Hack compiler, etc.). The fact that there are real-world compilers with perfectly approachable code bases (even without great familiarity with the language; compilers in Haskell might require an in-depth understanding of many of the core type classes and language extensions available) that are open source is highly attractive to novice compiler writers.

Additionally, the feature set in MLs is a good choice for compilers. While they lack some of the cooler features of Haskell, MLs make up for it in simplicity; lots of the features in GHC's type system (especially with language extensions) mean very little for 90% of compiler writers, and getting rid of them from the get-go helps keep the code small and easy to reason about (even if you won't have as much type safety in the compiler itself). This also means that there are a lot less ways to do a single thing, which can be nice when you're not sure exactly how you're going to implement a certain feature. However, one thing I really find incredibly useful is OCaml's polymorphic variants. These are pretty much perfect for statically enforcing a nanopass-like system in your compiler and are a great way of designing your AST/data types in your compiler. I feel like this gets passed up a ton (as far as I know I'm the first person who's used them to create nanopasses), but it's quite convenient and makes OCaml a good competitor for Scheme in this regard.

4 comments

Also, Haskell is something of a soup of DSL-operators that require one spend significant time researching what they mean and what behaviour they induce. Even with such knowledge, it suffers from the Perl-ish woe of write-once and read-never.
Wholly disagree. Yes in Haskell you can define operators yourself (they are just functions but made out of special characters and placed after the first argument, e.g.: "Hi " ++ username); and this is often done by Haskellists. So you sometimes need to learn a few new operators that come with a library to WRITE code using that lib; but in order to READ code I rarely need to ref the docs, it is just evident from the context.

> it suffers from the Perl-ish woe of write-once and read-never.

My experience with Haskell is opposite, I think Haskell yields very maintainable code that is largely self documenting and allows me to confidently hack around old code bases.

My experience with Perl is the same. Very hard to read back, maintain or get productive on old code bases.

I'll just leave [Control.Lens.Operators](http://hackage.haskell.org/package/lens-4.15.1/docs/Control-...) here...

In defense of Haskell, the situation is similar in Scala. I think some people just prefer inventing their own operators instead of using descriptive names.

To be fair, a number of those operators are

> This is an infix alias for cons. > This is an infix alias for snoc. > A convenient infix (flipped) version of toListOf. > An infix version of itoListOf.

And a lot of them are (<+~) and friends.

shrug When I look at Haskell code I see overly terse expressions and nothing resembling the concept of self-documentation.
Are you fluent in Haskell?
I was some years ago, before I gave up on the language; but one shouldn't have to be fluent in a language for it to satisfy the concepts of self-documentation and readability.

Requiring domain-specific knowledge of symbolic notation is not a trait of self-documenting languages. Recommending that the API search engine be open alongside or within your editor is likewise not a trait of self-documenting languages.

Self-documenting languages are coherent at-a-glance to average programmers with little to no knowledge of the language. Haskell is not this.

This article is from 1998, the landscape was much different. Haskell had a handful of the language features it has today, and was lacking many of the innovations in its runtime and libraries.
Very true. I didn't mean this so much as a criticism of the article, but more of an addition to what the article was already saying. I think all the arguments addressed in it are pretty good, but it's not 100% up to date with the current FP world.
Haskell may indeed be one of the most advanced languages out there in terms of raw power, but it is very complex (how many monad tutorials does it seriously take to teach one of the most core pieces of the language) and how much category theory do you need to know to be moderately effective? Also, the ecosystem could use some work. An example is the main string library isn't used in favor of a different one. Using the first and obvious one leads to performance worse than python and perl even after you compile. I'm being nitpicky, but anytime someone writes a blog post comparing a programming language to playing darksouls (game where you die thousands of times) I'd say you have an issue.
Honestly, none. I'm not a category theorist, and have no more understanding of monads than anyone who's used a javascript promise library, and I've been employeed professionally as a Haskell programmer for the past year, contributed libraries back to the community, and given talks in my local area.

Haskell is a language like any other. Many people would like to complicate it, but if you spend the time learning its syntax and semantics, there is very little need to learn the theory.

I started to write a toy compiler in OCaml. I had some previous experience with Haskell, but in no way an expert. I.e. no category theory background, only shallow exposure to monads.

My "problems" with OCaml started, when I wanted to "map" over a data structure I defined. I ended up having to define custom mapping functions for all container-like data structures I wrote and call them in a non-polymorphic fashion (where I would have just used fmap in Haskell).

Sure, in OCAML I needed to use a parser generator where I would have used megaparsec in haskell, but it was also a tolerable inconvenience.

Trouble started when I needed to track state in the compilation process. I.e. I was generating variable names for temporary results and values, and I needed to track a number that increased. In the end I used a mutable state for it, and it turned out nightmarish in my unit tests.

After a while, I just ported the code base to Haskell and never looked back. The State monad was an easy fix for my mutable state issues. Parser combinators made the parser much more elegant. And many code paths improved, became much more concise. It is hard to describe, but in direct comparison, OCaml felt much more procedural and Haskell much more declarative (and actually easier to read).

The only advantage of OCaml to me is the strict evaluation. I don't think lazy evaluation by default ins Haskell is a great idea.

I assume you were just not interested in passing the state around to the functions that needed it, and preferred the fact that the state monad hides that plumbing for you via bind and return. It's worth noting that there exist Ocaml libraries that provide the same operators and even similar do notation syntax that desugars to bind/return operators (via PPX).

Ocaml does tend to be more verbose than Haskell - it's just the nature of the language syntax. E.g., in Ocaml, one says (fun x -> x+1) vs (\x -> x+1). Similarly, ocaml is cursed by the excessive "in"'s that accompany let bindings. "Let .. in let .. in let ...". That can get annoying.

Interestingly, I had the opposite experience with a commercial compiler project. Haskell's syntactic cleverness (monadic syntax, combinator libraries, etc..) eventually got in the way - it became very difficult to understand what a single line of code actually meant since one had to mentally unpack layers of type abstractions. Migrating to ocaml, the verbosity eventually was more tolerable than the opacity of the equivalent Haskell code once the compiler got sufficiently complex.

My experience may vary from yours. I've been doing Haskell/Ocaml in production for many years, so the pain points I've adapted to are likely different than one working on toy compilers or weekend projects. And no, category theory exposure is not and never has been necessary for understanding Haskell or FP unless one is a PL researcher (and even then, only a subset of PL researchers are concerned with those areas). And one can be quite productive and prolific in Haskell without a deep understanding of monads and monad transformers - the blogosphere has given you the wrong impression if you believe otherwise.

I've done a lot of production Haskell and I've had a similar experience.

In our case, we dealt with it by keeping relatively bare, boring code. We avoided point-free style, crazy combinators like lenses, and complex monad transformer stacks except in the 'plumbing' part of the application that didn't need to change very much.

This paid off in spades as we had a lot of engineers who only ever had to work in the 'porcelain' parts of the application. They got a lot of great work done using abstractions that matched their intuition exactly.

Thanks for the thorough reply and it sounds like you're quite experienced here. Any chance going into more detail with what you do for a living? Do you maintain a compiler for something more mainstream?
I cofounded a company recently that is using code transformation and optimization methods to accelerate data analytics code on special purpose hardware. Our compilation toolchain is all ocaml, and the language that is compiled/transformed/optimized is Python. Prior to this venture, I did similar work - code analysis and transformation, but in that case largely around high performance computing for scientific applications. That tooling was mostly ocaml/Haskell, but not production focused - it was mostly research code.
Just want to note that, while not frequently seen, you can use more powerful abstractions (monad transformers, parser combinators, etc) in OCaml.

As an example consider looking at the Angstrom[1] parser combinator library and my Pure[2] functional base library.

[1] https://github.com/inhabitedtype/angstrom

[2] https://github.com/rizo/pure

> (how many monad tutorials does it seriously take to teach one of the most core pieces of the language) and how much category theory do you need to know to be moderately effective?

I believe the answer to both questions is zero. Unfortunately there is pedagogical cruft in the community that makes it appear this way. main :: IO () is comparable to public static void main(args[]) or whatever nonsense in Java.

> These are pretty much perfect for statically enforcing a nanopass-like system in your compiler and are a great way of designing your AST/data types in your compiler.

Nice, I was just asking about this on the nanopass list the other day, do you happen to have a publicly available example of this anywhere?