Hacker News new | ask | show | jobs
by trishume 3418 days ago
It's harder than it sounds if you want to support many languages. The Sublime syntaxes repo I use has 34,000 lines of grammars whereas my engine is only 3000 lines of code. If you count all the tmLanguage files for nice languages available online it's probably hundreds of thousands of lines, and that's in a pretty dense format. The whole point of using tmLanguage files is that people don't care about how fast other languages are if there is no highlighting for their language.

I could get way better performance by rewriting all those grammars using compiled parsers in Rust (like Xi has as an option https://github.com/google/xi-editor/blob/master/rust/lang/Ca...) but it would take an absurd amount of effort.

The speed of highlighting tmLanguage files is limited mostly by the regex library. My code spends 50% of its time in Oniguruma. You can improve that a bit by using a fancy regex engine, which is how Sublime is faster than my engine, but this evening I'm going to try to port my library to a faster engine based on Rust's regex crate. But that library (https://github.com/google/fancy-regex) is brand new and almost untested. It's the only open source library that is faster than Oniguruma while supporting all the right features. In the future VSCode may be able to gain some speed by switching to it, but they have much more overhead over the regex library to eat away first than I do.