i wanted to add lambdas to JS, this was back in 2001 i think, so i had this compiler from my compiler class, changed it to parse JS (+ my extra features like the ()=>{} syntax) and generate an AST in XML (instead of emitting bytecode), then i made a python script that took the XML and generated minified JS from it.
Neat! I did something similar, not quite as big, but I was on a version of Python (Jython) that didn't yet have generator expressions, so I made a generator expression evaluator, gg("x for x in it") that returned a running generator. I was about a week away from shipping an ETL tool to the customer and this allowed me to hit that deadline.
> LALRPOP in fact uses LR(1) by default (though you can opt for LALR(1)), and really I hope to eventually move to something general that can handle all CFGs (like GLL, GLR, LL(*), etc).
Seems overkill for a language whose grammar is LL(1)?
Although the CPython implementation contains an LL(1) parser (which is being deprecated for a new PEG parser), the grammar contains bits that are not context-free, which involved a pre-parsing step before being fed into the LL(1) parser. That structure isn’t particularly good and it’d be beneficial for a new implementation to use something else.
> Python 3.9 uses a new parser, based on PEG instead of LL(1). The new parser’s performance is roughly comparable to that of the old parser, but the PEG formalism is more flexible than LL(1) when it comes to designing new language features. We’ll start using this flexibility in Python 3.10 and later.
That Python can be described by an LL(1) grammar is frequently mentioned in discussions, but I don't know if it's formally documented anywhere.
More specifically, the grammar [0] is in EBNF form and is ELL(1). That means that if each EBNF production is converted to a set of BNF productions in an LL(1)-compliant way, the BNF grammar as a whole is LL(1). It seems that the Python tools themselves do not check this [1], but I have verified it myself.
However, as another commenter mentioned, the grammar doesn't exactly describe Python, but a superset of it. The compiler needs to perform further checks after parsing that "should" be part of the parser itself. A more advanced parser would allow these checks to be performed in the correct place - this is probably what RustPython does when using LR(1), and was one reason why CPython recently replaced its LL(1) parser with one based on PEG.
Parsing a superset of the language and then using extra checks to detect bad syntax accepted by the superset is a well known and very effective way of being error tolerant and implementing decent error recovery.
It is a great strategy. One of my favorite examples of this is Rust. When talking about what synatax "await" should use, we decided on a postfix syntax: "foo.await" . But there was concern that because JavaScript and other languages use "await foo", it would be confusing.
Solution?
async fn foo() {
}
fn main() {
let x = async { await foo() };
}
error: incorrect use of `await`
--> src/main.rs:6:13
|
6 | let x = async { await foo() };
| ^^^^^^^^^^^ help: `await` is a postfix operation: `foo().await`
Do exactly that: parse the incorrect form and emit a good diagnostic.