Hacker News new | ask | show | jobs
by hyperpallium 3354 days ago
How are simple parsers written in ML (or ocaml)?

You can't use the coding style used for recursive descent in the Dragon compiler book, without using mutable variables.

Do you have to use parser combinators, which have their own limitations?

4 comments

Parser combinators tend to work by recursive descent, so cannot handle left recursive grammars [1], and tend to be really slow. The latter is not a problem for many applications, but removing left recursion can be irritating even for small grammars. It is possible to build combinator parsers that can handle all context-free grammars [2], but I'm not sure any of Ocaml's are built that way.

In any case, Ocaml has parser generators that are fast, do bottom-up parsing (hence handle left-recursion without issue) and not based on parser combinators, e.g. ocamlyacc [3].

I'd use parser combinators for quick prototypes, and, if measurement shows a performance problem, replace them with a generated (e.g. by ocamlyacc) parser. As far as I remember the parser in Ocaml's (superfast) compiler is generated by ocamlyacc.

[1] https://en.wikipedia.org/wiki/Left_recursion

[2] T. Ridge, Simple, functional, sound and complete parsing for all context-free grammars. http://www.tom-ridge.com/resources/ridge11parsing-cpp.pdf

[3] https://caml.inria.fr/pub/docs/manual-ocaml/lexyacc.html

In production code we tend to use ocamlyacc or menhir. There is nothing about ocaml/ML that prohibits the use of the kind of parser generators one would expect in any other language.
You can do mutation in both Standard ML and OCaml. If I recall, Robert Harper's book on Standard ML starts off with a recursive descent parser.
The most common way is using a parser generator (ocamlyacc or menhir, both are basically interchangeable 99% of the time). Parser combinators are a choice, but honestly I think recursive descent parsers are more common than combinators in OCaml.