| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ransom_rs 1964 days ago
	This uses [lalrpop](https://github.com/lalrpop/lalrpop) under the hood I believe, which is a pretty awesome parser generator library for Rust.

2 comments

popotamonga 1964 days ago

Nice, long go the days (20 yrs ago) i had to make a JS transpiler with lex+yacc. The pain

link

sitkack 1964 days ago

had to? or got to? Sounds great, please tell us more.

link

popotamonga 1963 days ago

i wanted to add lambdas to JS, this was back in 2001 i think, so i had this compiler from my compiler class, changed it to parse JS (+ my extra features like the ()=>{} syntax) and generate an AST in XML (instead of emitting bytecode), then i made a python script that took the XML and generated minified JS from it.

link

sitkack 1963 days ago

Neat! I did something similar, not quite as big, but I was on a version of Python (Jython) that didn't yet have generator expressions, so I made a generator expression evaluator, gg("x for x in it") that returned a running generator. I was about a week away from shipping an ETL tool to the customer and this allowed me to hit that deadline.

I love the AST in XML approach.

link

pansa2 1964 days ago

> LALRPOP in fact uses LR(1) by default (though you can opt for LALR(1)), and really I hope to eventually move to something general that can handle all CFGs (like GLL, GLR, LL(*), etc).

Seems overkill for a language whose grammar is LL(1)?

link

uranusjr 1964 days ago

Although the CPython implementation contains an LL(1) parser (which is being deprecated for a new PEG parser), the grammar contains bits that are not context-free, which involved a pre-parsing step before being fed into the LL(1) parser. That structure isn’t particularly good and it’d be beneficial for a new implementation to use something else.

link

guerrilla 1964 days ago

> whose grammar is LL(1)?

Just curious, how did you know this?

link

huac 1964 days ago

The Python 3.9 (most recent) release notes:

> Python 3.9 uses a new parser, based on PEG instead of LL(1). The new parser’s performance is roughly comparable to that of the old parser, but the PEG formalism is more flexible than LL(1) when it comes to designing new language features. We’ll start using this flexibility in Python 3.10 and later.

https://docs.python.org/3/whatsnew/3.9.html#new-parser

link

pansa2 1964 days ago

That Python can be described by an LL(1) grammar is frequently mentioned in discussions, but I don't know if it's formally documented anywhere.

More specifically, the grammar [0] is in EBNF form and is ELL(1). That means that if each EBNF production is converted to a set of BNF productions in an LL(1)-compliant way, the BNF grammar as a whole is LL(1). It seems that the Python tools themselves do not check this [1], but I have verified it myself.

However, as another commenter mentioned, the grammar doesn't exactly describe Python, but a superset of it. The compiler needs to perform further checks after parsing that "should" be part of the parser itself. A more advanced parser would allow these checks to be performed in the correct place - this is probably what RustPython does when using LR(1), and was one reason why CPython recently replaced its LL(1) parser with one based on PEG.

[0] https://docs.python.org/3.8/reference/grammar.html

[1] https://discuss.python.org/t/should-there-be-a-check-whether...

link

seanmcdirmid 1964 days ago

Parsing a superset of the language and then using extra checks to detect bad syntax accepted by the superset is a well known and very effective way of being error tolerant and implementing decent error recovery.

link

steveklabnik 1963 days ago

It is a great strategy. One of my favorite examples of this is Rust. When talking about what synatax "await" should use, we decided on a postfix syntax: "foo.await" . But there was concern that because JavaScript and other languages use "await foo", it would be confusing.

Solution?

  async fn foo() {
      
  }
  
  fn main() {
      let x = async { await foo() };
  }

  error: incorrect use of `await`
   --> src/main.rs:6:13
    |
  6 |       let x = async { await foo() };
    |                       ^^^^^^^^^^^ help: `await` is a postfix operation: `foo().await`

Do exactly that: parse the incorrect form and emit a good diagnostic.

link