Hacker News new | ask | show | jobs
by a0 4256 days ago
ML's syntax is indeed less confusing than OCaml's. And that's the reason I'm working on a compiler front-end replacement for OCaml inspired by Clojure, Haskell, Ruby and Julia. It will be for OCaml what Elixir is for Erlang. I'm finishing the parser right now and will next implement the language primitives as a library in the language itself.
2 comments

Do you share the development anywhere online? Are you using Menhir? I could maybe help you if you're having any troubles with the parser, I've been working on many interesting and hard (LA)LR parser issues in the past few months. What will the syntax be like, any existing language, maybe like Julia?
I don't have the code publicly available yet, but all the development will happen openly on GitHub. I estimate to prepare a minimal working compiler in the next weeks.

I've been thinking a lot about how to perform the parsing. I believe it's important to have a complete metaprogramming system (with code macros and quasiquotations) with all the basic language primitives defined in the language itself. So standard parsing with Menhir may not be ideal for this purpose. Instead, I've ported a simple top-down operator parsers (also known as Pratt parser) to OCaml, as described for example here[1]. This is the technique used by Douglas Crockford for the JSLint parser[2].

The core of the language can be seen as a simple compiler toolkit, that parses generic expressions and produces native OCaml AST. So for example even things like assignment (`=`) or function definitions are regular macros. This is one of the reasons I named the language "Meta".

There's another interessting language that inspired me a lot called Magpie[3] which uses identical approach (described in detail here [4]).

The main goal of the language is simplicity. OCaml is a very powerful language but programming in it requires some writing effort. I want the freedom and natural expressiveness of Clojure or Python combined with type security guarantees and modularity of OCaml.

I'm developing a large system for my startup in OCaml right now and plan to incrementally port it to Meta, which I think is important for dogfooding experiments, for support and commitment.

The syntax will look a lot like Julia (although I considered to just adopt s-expressions, languages like Elixir and Julia showed as that it's possible to be homoiconic (in some restricted sense) and still use regular syntax). I am still in research and only have defined basic language constructs like variable bindings, function definitions, type annotations, pattern matching and macros. If you are interested I can show you some examples.

What do you think about it? It would be nice to hear some early feedback.

[1]: http://journal.stuffwithstuff.com/2011/03/19/pratt-parsers-e...

[2]: http://javascript.crockford.com/tdop/tdop.html

[3]: http://magpie.stuffwithstuff.com/

[4]: http://journal.stuffwithstuff.com/2011/02/13/extending-synta...

Please don't name it just "Meta", it will make it a lot more difficult to find resources for it in the web. Imagine googling for "meta", "meta lang", "meta language", "metalang". Maybe something to go with "meta", like MetaCaml.
Agreed, just meta probably isn't a very good name (neither is standard meta-language for that matter, but at least the abbreviation sml isn't entirely hopeless). Just "meta" would be rather close to OMeta[1] as well.

I tried to see if there might be a more generic animal name to use for inspiration, and so learned that: "The even-toed ungulates (Artiodactyla) are ungulates (hoofed animals) whose weight is borne approximately equally by the third and fourth toes, rather than mostly or entirely by the third as in odd-toed ungulates (perissodactyls), such as horses.". Which while somewhat interesting doesn't really reveal an immediately usable name. But at least Artiodac should be unique...

[1] http://tinlizzie.org/ometa/

The hyrax. A very cute animal that can scale sheer rock faces.
Does seem to be rather distantly related to camels, though?
So, I think my ideas about syntax are similar in their goals, but different in some details.

1) I experimented with Pratt parsers, and actually made a great, extensible, full-blown parser, but the devil was in the details - if you parse your whole language with a Pratt parser, you have to get the operator/keyword precedence and associativity just right. It's probably possible to get it just right, but the problem with Pratt parsers is that you just don't know it; in particular, you don't know if your syntax has any ambiguities (i.e. things that might not parse the way you want them to).

2) Because of that, I abandoned Pratt parsers and went back to LALR(1) (yacc). It's tedious and complicated, but it has the nice property that it notifies you of all syntax ambiguities, a property that I haven't found in any other parsing system (recursive descent/LL, Pratt, PEG). Of course, some people say that PEG is unambiguous, but these people are either stupid or ignorant; PEG just doesn't tell you where the ambiguities exist, and always takes the first choice. LALR is "not ambiguous" in the same way, in case of shift/reduce conflict it always chooses shift, in case of reduce/reduce conflict it chooses the first choice, but at least it tells you where the choices were made, so that you can examine and fix them.

3) LALR(1) is also quite stupid and limited, which is why Menhir has been a blessing - it's practically as efficient as ocamlyacc (in theory, at least - ocamlyacc produces compiled C code, while Menhir produces OCaml with Obj.magic), but parses LR(1) instead of LALR(1), which makes writing grammars for it much easier, and produces nicer error messages. I've managed to write a very flexible, Julia-like syntax (except with {} instead of begin/end) that supports tuples, arrow function syntax, and pattern matching.

4) Extensibility is important for me, but I intentionally want to limit it - I don't want programmers to be able to (re)define basic language syntax, like ` = `, as that could fragment the code and make the syntax unpredictable/ambiguous. However, I want to include user-defined operators (with custom precedence/associativity), which could be done using an embedded Pratt parser, and Julia-like macros that are always preceded by `@` and can only be used in a few predetermined forms (function calls/statements/blocks). I think that makes the syntax much more manageable and readable. Also, I don't like Elixir's syntax, to many semicolons/`do:` keywords. I haven't actually implemented the above yet, but I think it could be done within my current Menhir parser infrastructure.

Another thing: I too strongly discourage the name Meta, especially for OCaml, because there is already a project that's called MetaOCaml.

I'm curious about what you find particularly confusing about OCaml. I'm a (relative) newcomer to the language, but I find it extremely readable, even if it's a bit clunky at times.
Good OCaml code is one of most readable in the category of static languages, I agree. But it's less "writable" than readable in my opinion. Of course you can get used to the syntax, but for me expressiveness and clarity of comprehension comes from both: simplicity of writing and reading.

Often I feel like there is a barrier between your thoughts and their execution which does not exist in Python or Ruby for example. When I switch from OCaml to Python it's like a breath of fresh air.

In conclusion I think common tasks must be syntactically abstractable. For example, I find working with standard data structures in OCaml annoying, as if I were programming in C. There are tons of syntax extensions for OCaml that try to fix these defects, but it only supports my claims.

Learning Python was certainly easier for me. I could really hit the ground running. OCaml had a much steeper learning curve. It took me a while to get used to the compiler and the different style of programming.

However, I'm a professional. Some of the powerful tools I use take a serious investment to learn. Here it paid off well.

Now I find that Python programs are easy to start but OCaml programs are easy to finish. The compiler is an invaluable helper that I miss when using other languages. Some syntax is clunkier than others but to me having good types (and a good compiler) makes or breaks a language. I can get my work done in anything but this is an area where the language really helps me out.

What does this code do?

  if a > 0 then
    print_endline "a > 0" ;
    match a with
      | 0 -> print_endline "impossible"
      | 2 ->
        match b with
          | 0 -> print_endline "2, 0"
          | _ -> print_endline "2, not 0"
      | 3 -> print_endline "3, something"
      | _ -> print_endline "something, something"
  ;
  print_endline "done"
Any decent programming editor will be able to indent that properly and you will see the problem. Also, I agree that it is a bit ugly but it's not that complicated to understand how the match syntax works. Adding a "begin" and an "end" in this code is simple enough :).
It's not just the match syntax; it's also the `;` expression separator.

> Any decent programming editor will be able to indent that properly and you will see the problem.

This seems like a weak excuse. In particular, I could turn it around and say, "any decent programming language should be writable without an editor". Also, the issue isn't just reading, it's writing too - it's much harder to foresee/plan all the `begin`/`end`, while you're writing a line of code, that will make the lines that follow work as intended.

> This seems like a weak excuse. (…)

Fair enough, but for the ';', as others explained meanwhile, I think it's pretty simple to understand, it's just that we are not used to it because of C syntax.

EDIT: actually, you are right about ';', it is confusing when I think about it: it does not have the same behavior in a branch of a `match` and in the branch of an `if`… I wonder how I never had problem with that before.

It depends on the learning path of each one.

In the Pascal family of languages, ';' is a separator as well.

OCaml will give you compiler warnings on the incomplete and redundant match cases. F# will silently execute a semantically different version of your code. I prefer OCaml. Indentation sensitivity was one of F#'s design flaws.
As someone without too much OCaml experience, how do you disambiguate this?
1) Ignore indentation (I intentionally wrote it to deceive, but similar examples could easily appear in real code). 2) `match` cases always bind to the innermost `match` statement. 3) Statement separators (`;`) are tricky.

The correct indentation of the above code is:

  if a > 0 then
    print_endline "a > 0" ;
  match a with
    | 0 -> print_endline "impossible"
    | 2 ->
      match b with
        | 0 -> print_endline "2, 0"
        | _ -> print_endline "2, not 0"
        | 3 -> print_endline "3, something"
        | _ -> print_endline "something, something"
            ;
            print_endline "done"
To have the same meaning as the indentation implies, you would need to add `(...)` or `begin ... end` at the appropriate places.
What would be the corresponding SML? I see how the way SML binds ';' would help here, but would it change anything about the last two badly indented matching? And it seems to provide you with the equivalent footgun if you attempt to do multiple side-effect operations in a match branch, which in my experience you want to do more often than you need to separate matches with ';'.
I figured the indentation was a lie, but didn't realize you would have to introduce something like begin...end to make it work the way it appears.
There are two things to remember. The first is the match binding thing that tomp mentions in his reply.

The second one is that in OCaml, semicolon is a separator, and not a terminator. In contrast, in C/C++, semicolon is a terminator. If you have an expression, you end it with a ";" just because.

This is not the case for OCaml. In OCaml, semicolon is used to separate two sequential expressions where only one expression is expected. Thus,

    <expr1>;<expr2> 
is evaluated in sequence, and can be used in a place where only one is expected. For example, if statements have the following syntax:

    if <expr1> then <expr2> else <expr3>
Now if you wanted to do two things (instead of one) in the "then" block, you would simply write

    if <expr1> then <expr2.1>;<expr2.2> else <expr3>
Notice that under these rules

    if <expr1> then <expr2>; else <expr3> 
makes no sense. Separators are not terminators. We are used to thinking of ";" as terminators because of C/C++.

    if <expr1> then <expr2.1>;<expr2.2> else <expr3>
Actually, this will not work, I was also confused by it.

    # if true then (); 1 else 2;;
    Error: Parse error: [str_item] or ";;" expected (in [top_phrase])
    # if true then begin (); 1 end else 2;;
    - : int = 1
    # if true then ((); 1) else 2;;
    - : int = 1
The confusion comes from the fact that it doesn't behave the same way in match expressions:

    # match true with | true -> (); 1 | false -> 2;;
    - : int = 1
I'll add that you need ';'-as-separator (you also use ';' in other places, such as when separating items in a list) only when calling functions which return 'unit', which is typically functions which perform side-effects (eg, close a file). That's usually a small portion of your program.