Hacker News new | ask | show | jobs
by ericlippert 2271 days ago
One of the most important things you can do in perf analysis is to know when to stop looking for incremental improvement.

If this subject in particular interests you, we did a lot of work in the C# lexer/parser so that once the file is lexed, it only re-lexes the tokens which changed on every edit. It also does fun stuff like the syntax colourizer only runs on code that's actually on the screen. Getting every operation that depends on the lex/parse of code in the editor down to running in much less than 30ms so that it would not slow down keystrokes was a huge amount of work.

1 comments

Does the C# parser implement incremental parsing by just using a recursive descent parser with caching, or does it do parsing with nonterminals as lookahead?
It does a full lex and parse. Then if there is an edit, it determines based on the edit which tokens need to be re-lexed; maybe we went from "[10,20]" to "[10.20]" and so now the [ and ] tokens are fine but everything in the middle is changed.

So we go from a data structure representing "original text plus edit" to a data structure representing "original lex plus changes". Now we have the information we need to do the same to the parse tree, which has also been stored. Given the set of tokens in the program which changed, and knowledge of where the textual boundaries are of every parse node, we can restrict the re-parse to the affected syntax tree spine. In the example given, we know that we've still got, say, an array index list but the contents of the list need to be re-parsed, so we re-start the parser on the left bracket.

The algorithm that does this is called "the blender", and reading that code makes my brain feel like it is in a blender. The code was written by Neal Gafter and based on his PhD thesis in incremental parser theory.

The source code is available on github; do a search for "roslyn" and you'll find it.