|
I don't know specifically - but even now, i still end up having to explain to people that incremental parsing/lexing (particularly without error recovery) is not hard, it is not really complicated, and as the author here said, Tim (et al) have made beautiful algorithms that make this stuff easy. Heck, incremental lexing is even easy to explain.
For each token, track where the lexer actually looked in the input stream to make decisions. Any time that part of the input stream changes, every token to actually look at the changed portion of the input stream is re-lexed, and if the result changes, keep re-lexing until the before/after tokenstreams sync up again or you run out of input. That's it. You can also make a dumber version that statically calculates the maximum lookahead (lookbehind if you support that too) of the entire grammer, or the maximum possible lookahead per token, and uses that instead of tracking the actual lookahead used.
In practice, this is often harder than just tracking the actual lookahead used. In an LL system like ANTLR, incremental parsing is very similar - since it generates top-down parsers, it's the same basic theory - track what token ranges were looked at as you parse.
During incremental update, only descend into portions of the parse tree where the token ranges looked at contain modified tokens. Bottom up is trickier.
Error recovery is the meaningfully tricky part in all of this. Before tree-sitter, I was constantly explaining this stuff to people (I followed the projects that these algorithms came out of - ENSEMBLE, HARMONIA, etc).
After more people get that there are ways of doing this, but you still run into people who are re-creating things we solved in pretty great ways many years ago. |