Hacker News new | ask | show | jobs
by guerrilla 1964 days ago
> whose grammar is LL(1)?

Just curious, how did you know this?

2 comments

The Python 3.9 (most recent) release notes:

> Python 3.9 uses a new parser, based on PEG instead of LL(1). The new parser’s performance is roughly comparable to that of the old parser, but the PEG formalism is more flexible than LL(1) when it comes to designing new language features. We’ll start using this flexibility in Python 3.10 and later.

https://docs.python.org/3/whatsnew/3.9.html#new-parser

That Python can be described by an LL(1) grammar is frequently mentioned in discussions, but I don't know if it's formally documented anywhere.

More specifically, the grammar [0] is in EBNF form and is ELL(1). That means that if each EBNF production is converted to a set of BNF productions in an LL(1)-compliant way, the BNF grammar as a whole is LL(1). It seems that the Python tools themselves do not check this [1], but I have verified it myself.

However, as another commenter mentioned, the grammar doesn't exactly describe Python, but a superset of it. The compiler needs to perform further checks after parsing that "should" be part of the parser itself. A more advanced parser would allow these checks to be performed in the correct place - this is probably what RustPython does when using LR(1), and was one reason why CPython recently replaced its LL(1) parser with one based on PEG.

[0] https://docs.python.org/3.8/reference/grammar.html

[1] https://discuss.python.org/t/should-there-be-a-check-whether...

Parsing a superset of the language and then using extra checks to detect bad syntax accepted by the superset is a well known and very effective way of being error tolerant and implementing decent error recovery.
It is a great strategy. One of my favorite examples of this is Rust. When talking about what synatax "await" should use, we decided on a postfix syntax: "foo.await" . But there was concern that because JavaScript and other languages use "await foo", it would be confusing.

Solution?

  async fn foo() {
      
  }
  
  fn main() {
      let x = async { await foo() };
  }

  error: incorrect use of `await`
   --> src/main.rs:6:13
    |
  6 |       let x = async { await foo() };
    |                       ^^^^^^^^^^^ help: `await` is a postfix operation: `foo().await`
Do exactly that: parse the incorrect form and emit a good diagnostic.