Hacker News new | ask | show | jobs
by soegaard 4223 days ago
Github now uses the lexer framework from TextMate and SublimeText. If your language community happens to use those editors, then you are fine.

For Racket 99% uses either DrRacket or Emacs. This implies that the lexer deployed is very rudimentary.

Any pointers besides the TextMate documentation for writing lexers are welcome.

1 comments

Which is sad, because the TextMate lexer design is really really awful. Mostly undocumented. Lots of oniguruma-specific regexes used in the syntaxes. Inefficient beyond comprehension.

For instance, TM syntaxes can legally have recursion loops in them, which TextMate will cut so that the app doesn't spin into infinite recursion. But the precise way that it does this is a mystery.

The pygments design is better for static syntax highlighting.

Point of curiosity: Chocolat is compatible with TextMate syntax files, IIRC. Was going with TM syntax purely a pragmatic choice? Is it not so bad for in-editor syntax highlighting? It seems like virtually every text editor that hit the market -- or whatever one would say for free programs like Atom -- after TextMate adapted TM syntax files. (While BBEdit's comparative inflexibility in syntax highlighting, even the new BBEdit 11 format, irks me, it's hard not to notice that it's a much better performer on giant files.)
There are two types of syntax highlighting: static and dynamic. Static is like GitHub/Gist/Pastebin, dynamic is like Atom/TM/Sublime. Static highlights the file straight through, and the result can be cached indefinitely. Dynamic highlighting in a text editor parses as little of the document as is theoretically possible, in response to an insertion, deletion or replacement.

For static there's tons of choices. Pygments, prism.js, GeSHi for PHP, etc. Any idiot can write a static highlighting system. But none of these can be used in-editor.

For dynamic highlighting, there is only one game in town and that's tmbundles. Only TextMate has support for the 100s of languages in existence, including the new ones that pop up each day.

I would love to replace tmbundles. I know just how to implement it. But the problem is, who is going to write all the long-tail language support? VHDL, Pascal, GAP, AtScript, Julia, ...

- - -

Interesting you mention BBEdit. I have a test file I call "the behemoth" which consists of a python file with 32000 copies of this:

    """ """
The challenge is to insert """ at the top of the file and see how the text editor cries in pain. It's torture to a syntax highlighter.

To pass, the editor must

1. Load the file quickly

2. Have smooth scrolling inside the file, even after making the change.

3. Color the quotes properly through the end of the file, before and after.

To my knowledge BBEdit is the only editor to pass the test. Emacs is a good 2nd place.