Hacker News new | ask | show | jobs
by benjaminjosephw 1366 days ago
> langcc is general enough that the tool is self-hosting: that is, one can express the "language of languages" in the "language of languages" itself

There's something deeply satisfying about recursive tools who's inputs can include the definition of the tool itself. Brilliant.

3 comments

That claim is sort misleading. The complexity of the specification's own grammar is largely unrelated to it's expressiveness.

I'd expect most parser generators in this category to be able to parse their own grammars.

Imagine a very weak parser that can only handle LR0. But if it uses a Lisp-like grammar language, it too is self-hosting.

> The complexity of the specification's own grammar is largely unrelated to it's expressiveness.

E.g.:

  bitstring : bistring '1'
            | bitstring '0'
            | /* empty */
            ;
this expresses all that can be expressed. The rest is semantics.
The language of BNF for langcc (https://github.com/jzimmerman/langcc/blob/main/grammars/meta...) provides many syntactic conveniences that are not present in a Lisp-like language, so the fact that langcc supports it is a nontrivial achievement. In particular, an LR(0) parser would not be anywhere near adequate for it.
I wasn't saying langcc was not powerful, just that the shape of the argument doesn't make sense.

That part of the README goes something like this:

1. It can parse Python and Go efficiently.

2. In fact it's so expressive that it can even parse itself, which is a "language of languages".

If you had first shown some hard-to-parse langcc syntax, then sure, _that_ would be evidence of expressiveness. But there's nothing impressive about being able to parse a "language of languages", since a language of languages can be LR(0).

The OG of recursive parser generators whose inputs can include the definition of the parser generator itself is D. Val Schorre's META-II, which compiles itself (or other compilers written in the same grammar language) to an assembly language for a parsing-oriented abstract machine. Thanks to the ACM's enlightened policies, the META-II paper is now freely available (though not yet, as far as I can tell, open access): https://dl.acm.org/doi/10.1145/800257.808896

I wrote an improved version called Meta5ix http://canonical.org/~kragen/sw/dev3/meta5ix.m5 (an interpreter for the virtual machine, with a complete description, is at http://canonical.org/~kragen/sw/dev3/meta5ixrun.py, with the precompiled grammar at http://canonical.org/~kragen/sw/dev3/meta5ix.generated.m5asm) and adapted it to produce C: http://canonical.org/~kragen/sw/dev3/meta5ix2c.m5 (precompiled version at http://canonical.org/~kragen/sw/dev3/meta5ix2c.c).

Meta5ix written in itself is only 18 lines of code. Briefly, "" encloses expected input, {} encloses a line of output, [] encloses repeating constructs, commas separate alternatives (implemented without backtracking), fnord skips leading whitespace, $it copies the last <<>>-enclosed input token to the output, and other $variables interpolate locally-defined assembly labels.

    - program: ["-" name @it ":" terms {return}, "#" [:notin ""]]
    - terms: term ["," {continue $choice} term] @choice
    - term: (factor {else $seq}, output) [factor {assert}, output] @seq
    - factor: string {literal $it}
           , "(" terms ")"
           , "[" @many terms {continue $many} "]"
           , name {call $it}
           , "<<" {begin} terms ">>" {end}
           , ":fnord" {fnord} factor
           , ":notin" string {notin $it}
           , ":between" string {between $it}
    - output: (quasiquote, "@" {dedent} (var, quasiquote)) {writeline}
    - quasiquote: "{" [(<<ch [ch]>>, "\" <<:notin "">>) {say "$it"}, "$" var] "}"
    - ch: :notin "$}\"
    - var: "it" {copyinput}, name {gen $it}
    - string: :fnord <<'"' [:notin '"'] '"', "'" [:notin "'"] "'">>
    - name: :fnord <<letter [letter, :between "09"]>>
    - letter: :between "az", :between "AZ", :between "__"
Meta5ix, like my earlier peg-bootstrap https://github.com/kragen/peg-bootstrap/blob/master/peg.md (66 lines of code, compiles to JavaScript, supports backtracking), is not really something you want to write a parser for your application language in. It's a compiler-compiler you can extend to support the parsing constructs you actually want for your language, then recompile with itself. Dave Long described META-II as "a field-improvised lever", and I think that's true of these as well, but maybe an even better analogy is Archimedes's fixed point.
What does “OG” stand for?
https://www.urbandictionary.com/define.php?term=OG

> OG used to mean Original Gangster allthough some poeple these days use OG as a quicker way of saying Original

Yes, I meant that META-II was the Original Gangster of self-compiling compiler compilers.
I kind of hate systems like this though. What I want is easily bootstrappable compilers: how hard is it to get the system back from nothing but raw x86_64 machine code and some type of data storage.