Hacker News new | ask | show | jobs
by pansa2 1965 days ago
That Python can be described by an LL(1) grammar is frequently mentioned in discussions, but I don't know if it's formally documented anywhere.

More specifically, the grammar [0] is in EBNF form and is ELL(1). That means that if each EBNF production is converted to a set of BNF productions in an LL(1)-compliant way, the BNF grammar as a whole is LL(1). It seems that the Python tools themselves do not check this [1], but I have verified it myself.

However, as another commenter mentioned, the grammar doesn't exactly describe Python, but a superset of it. The compiler needs to perform further checks after parsing that "should" be part of the parser itself. A more advanced parser would allow these checks to be performed in the correct place - this is probably what RustPython does when using LR(1), and was one reason why CPython recently replaced its LL(1) parser with one based on PEG.

[0] https://docs.python.org/3.8/reference/grammar.html

[1] https://discuss.python.org/t/should-there-be-a-check-whether...

1 comments

Parsing a superset of the language and then using extra checks to detect bad syntax accepted by the superset is a well known and very effective way of being error tolerant and implementing decent error recovery.
It is a great strategy. One of my favorite examples of this is Rust. When talking about what synatax "await" should use, we decided on a postfix syntax: "foo.await" . But there was concern that because JavaScript and other languages use "await foo", it would be confusing.

Solution?

  async fn foo() {
      
  }
  
  fn main() {
      let x = async { await foo() };
  }

  error: incorrect use of `await`
   --> src/main.rs:6:13
    |
  6 |       let x = async { await foo() };
    |                       ^^^^^^^^^^^ help: `await` is a postfix operation: `foo().await`
Do exactly that: parse the incorrect form and emit a good diagnostic.