|
The OG of recursive parser generators whose inputs can include the definition of the parser generator itself is D. Val Schorre's META-II, which compiles itself (or other compilers written in the same grammar language) to an assembly language for a parsing-oriented abstract machine. Thanks to the ACM's enlightened policies, the META-II paper is now freely available (though not yet, as far as I can tell, open access): https://dl.acm.org/doi/10.1145/800257.808896 I wrote an improved version called Meta5ix http://canonical.org/~kragen/sw/dev3/meta5ix.m5 (an interpreter for the virtual machine, with a complete description, is at http://canonical.org/~kragen/sw/dev3/meta5ixrun.py, with the precompiled grammar at http://canonical.org/~kragen/sw/dev3/meta5ix.generated.m5asm) and adapted it to produce C: http://canonical.org/~kragen/sw/dev3/meta5ix2c.m5 (precompiled version at http://canonical.org/~kragen/sw/dev3/meta5ix2c.c). Meta5ix written in itself is only 18 lines of code. Briefly, "" encloses expected input, {} encloses a line of output, [] encloses repeating constructs, commas separate alternatives (implemented without backtracking), fnord skips leading whitespace, $it copies the last <<>>-enclosed input token to the output, and other $variables interpolate locally-defined assembly labels. - program: ["-" name @it ":" terms {return}, "#" [:notin ""]]
- terms: term ["," {continue $choice} term] @choice
- term: (factor {else $seq}, output) [factor {assert}, output] @seq
- factor: string {literal $it}
, "(" terms ")"
, "[" @many terms {continue $many} "]"
, name {call $it}
, "<<" {begin} terms ">>" {end}
, ":fnord" {fnord} factor
, ":notin" string {notin $it}
, ":between" string {between $it}
- output: (quasiquote, "@" {dedent} (var, quasiquote)) {writeline}
- quasiquote: "{" [(<<ch [ch]>>, "\" <<:notin "">>) {say "$it"}, "$" var] "}"
- ch: :notin "$}\"
- var: "it" {copyinput}, name {gen $it}
- string: :fnord <<'"' [:notin '"'] '"', "'" [:notin "'"] "'">>
- name: :fnord <<letter [letter, :between "09"]>>
- letter: :between "az", :between "AZ", :between "__"
Meta5ix, like my earlier peg-bootstrap https://github.com/kragen/peg-bootstrap/blob/master/peg.md (66 lines of code, compiles to JavaScript, supports backtracking), is not really something you want to write a parser for your application language in. It's a compiler-compiler you can extend to support the parsing constructs you actually want for your language, then recompile with itself. Dave Long described META-II as "a field-improvised lever", and I think that's true of these as well, but maybe an even better analogy is Archimedes's fixed point. |