| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tomp 4255 days ago

So, I think my ideas about syntax are similar in their goals, but different in some details.

1) I experimented with Pratt parsers, and actually made a great, extensible, full-blown parser, but the devil was in the details - if you parse your whole language with a Pratt parser, you have to get the operator/keyword precedence and associativity just right. It's probably possible to get it just right, but the problem with Pratt parsers is that you just don't know it; in particular, you don't know if your syntax has any ambiguities (i.e. things that might not parse the way you want them to).

2) Because of that, I abandoned Pratt parsers and went back to LALR(1) (yacc). It's tedious and complicated, but it has the nice property that it notifies you of all syntax ambiguities, a property that I haven't found in any other parsing system (recursive descent/LL, Pratt, PEG). Of course, some people say that PEG is unambiguous, but these people are either stupid or ignorant; PEG just doesn't tell you where the ambiguities exist, and always takes the first choice. LALR is "not ambiguous" in the same way, in case of shift/reduce conflict it always chooses shift, in case of reduce/reduce conflict it chooses the first choice, but at least it tells you where the choices were made, so that you can examine and fix them.

3) LALR(1) is also quite stupid and limited, which is why Menhir has been a blessing - it's practically as efficient as ocamlyacc (in theory, at least - ocamlyacc produces compiled C code, while Menhir produces OCaml with Obj.magic), but parses LR(1) instead of LALR(1), which makes writing grammars for it much easier, and produces nicer error messages. I've managed to write a very flexible, Julia-like syntax (except with {} instead of begin/end) that supports tuples, arrow function syntax, and pattern matching.

4) Extensibility is important for me, but I intentionally want to limit it - I don't want programmers to be able to (re)define basic language syntax, like ` = `, as that could fragment the code and make the syntax unpredictable/ambiguous. However, I want to include user-defined operators (with custom precedence/associativity), which could be done using an embedded Pratt parser, and Julia-like macros that are always preceded by `@` and can only be used in a few predetermined forms (function calls/statements/blocks). I think that makes the syntax much more manageable and readable. Also, I don't like Elixir's syntax, to many semicolons/`do:` keywords. I haven't actually implemented the above yet, but I think it could be done within my current Menhir parser infrastructure.

Another thing: I too strongly discourage the name Meta, especially for OCaml, because there is already a project that's called MetaOCaml.