Hacker News new | ask | show | jobs
by kragen 1370 days ago
The OG of recursive parser generators whose inputs can include the definition of the parser generator itself is D. Val Schorre's META-II, which compiles itself (or other compilers written in the same grammar language) to an assembly language for a parsing-oriented abstract machine. Thanks to the ACM's enlightened policies, the META-II paper is now freely available (though not yet, as far as I can tell, open access): https://dl.acm.org/doi/10.1145/800257.808896

I wrote an improved version called Meta5ix http://canonical.org/~kragen/sw/dev3/meta5ix.m5 (an interpreter for the virtual machine, with a complete description, is at http://canonical.org/~kragen/sw/dev3/meta5ixrun.py, with the precompiled grammar at http://canonical.org/~kragen/sw/dev3/meta5ix.generated.m5asm) and adapted it to produce C: http://canonical.org/~kragen/sw/dev3/meta5ix2c.m5 (precompiled version at http://canonical.org/~kragen/sw/dev3/meta5ix2c.c).

Meta5ix written in itself is only 18 lines of code. Briefly, "" encloses expected input, {} encloses a line of output, [] encloses repeating constructs, commas separate alternatives (implemented without backtracking), fnord skips leading whitespace, $it copies the last <<>>-enclosed input token to the output, and other $variables interpolate locally-defined assembly labels.

    - program: ["-" name @it ":" terms {return}, "#" [:notin ""]]
    - terms: term ["," {continue $choice} term] @choice
    - term: (factor {else $seq}, output) [factor {assert}, output] @seq
    - factor: string {literal $it}
           , "(" terms ")"
           , "[" @many terms {continue $many} "]"
           , name {call $it}
           , "<<" {begin} terms ">>" {end}
           , ":fnord" {fnord} factor
           , ":notin" string {notin $it}
           , ":between" string {between $it}
    - output: (quasiquote, "@" {dedent} (var, quasiquote)) {writeline}
    - quasiquote: "{" [(<<ch [ch]>>, "\" <<:notin "">>) {say "$it"}, "$" var] "}"
    - ch: :notin "$}\"
    - var: "it" {copyinput}, name {gen $it}
    - string: :fnord <<'"' [:notin '"'] '"', "'" [:notin "'"] "'">>
    - name: :fnord <<letter [letter, :between "09"]>>
    - letter: :between "az", :between "AZ", :between "__"
Meta5ix, like my earlier peg-bootstrap https://github.com/kragen/peg-bootstrap/blob/master/peg.md (66 lines of code, compiles to JavaScript, supports backtracking), is not really something you want to write a parser for your application language in. It's a compiler-compiler you can extend to support the parsing constructs you actually want for your language, then recompile with itself. Dave Long described META-II as "a field-improvised lever", and I think that's true of these as well, but maybe an even better analogy is Archimedes's fixed point.
1 comments

What does “OG” stand for?
https://www.urbandictionary.com/define.php?term=OG

> OG used to mean Original Gangster allthough some poeple these days use OG as a quicker way of saying Original

Yes, I meant that META-II was the Original Gangster of self-compiling compiler compilers.