Hacker News new | ask | show | jobs
by greghendershott 4229 days ago
I didn't expect my blog post to be on the front page of HN. Here's a TL;DR summary:

For many languages this is a significant and distracting degradation in the presentation.

I could understand GitHub removing highlighting completely because they feel speed is the overriding priority. That would be even faster than what they're doing now. Languages would look "plain" instead of "wrong". Not my first choice, but a reasonable choice.

The situation now is that they've replaced a library that had been handling highlighting thoroughly, with a variety of text-editor lexers that mostly are not. People like me who already contributed to Pygments, aren't feeling motivated to do this all over again for no good reason. So it seems likely the lexers will remain poor for quite a long time. Which is unfortunate.

Finally, at the time I wrote my blog post, I was speculating about the motivation because GitHub hadn't explained why, yet. Someone later did explain ("because speed") in the issue thread.

1 comments

Are the majority of languages now broken, or just a small niche subset that don't see much use vs. ruby, python, etc?

If a few minor languages hardly anyone uses as compared to the whole site might need some fixing, this still seems like a win from Github's side of things since per the graph the change did in fact significantly improve render times.

Github now uses the lexer framework from TextMate and SublimeText. If your language community happens to use those editors, then you are fine.

For Racket 99% uses either DrRacket or Emacs. This implies that the lexer deployed is very rudimentary.

Any pointers besides the TextMate documentation for writing lexers are welcome.

Which is sad, because the TextMate lexer design is really really awful. Mostly undocumented. Lots of oniguruma-specific regexes used in the syntaxes. Inefficient beyond comprehension.

For instance, TM syntaxes can legally have recursion loops in them, which TextMate will cut so that the app doesn't spin into infinite recursion. But the precise way that it does this is a mystery.

The pygments design is better for static syntax highlighting.

Point of curiosity: Chocolat is compatible with TextMate syntax files, IIRC. Was going with TM syntax purely a pragmatic choice? Is it not so bad for in-editor syntax highlighting? It seems like virtually every text editor that hit the market -- or whatever one would say for free programs like Atom -- after TextMate adapted TM syntax files. (While BBEdit's comparative inflexibility in syntax highlighting, even the new BBEdit 11 format, irks me, it's hard not to notice that it's a much better performer on giant files.)
There are two types of syntax highlighting: static and dynamic. Static is like GitHub/Gist/Pastebin, dynamic is like Atom/TM/Sublime. Static highlights the file straight through, and the result can be cached indefinitely. Dynamic highlighting in a text editor parses as little of the document as is theoretically possible, in response to an insertion, deletion or replacement.

For static there's tons of choices. Pygments, prism.js, GeSHi for PHP, etc. Any idiot can write a static highlighting system. But none of these can be used in-editor.

For dynamic highlighting, there is only one game in town and that's tmbundles. Only TextMate has support for the 100s of languages in existence, including the new ones that pop up each day.

I would love to replace tmbundles. I know just how to implement it. But the problem is, who is going to write all the long-tail language support? VHDL, Pascal, GAP, AtScript, Julia, ...

- - -

Interesting you mention BBEdit. I have a test file I call "the behemoth" which consists of a python file with 32000 copies of this:

    """ """
The challenge is to insert """ at the top of the file and see how the text editor cries in pain. It's torture to a syntax highlighter.

To pass, the editor must

1. Load the file quickly

2. Have smooth scrolling inside the file, even after making the change.

3. Color the quotes properly through the end of the file, before and after.

To my knowledge BBEdit is the only editor to pass the test. Emacs is a good 2nd place.

The rule of thumb is that most languages are rarely used (since there are a finite number of users), so if you support only the most used languages you necessarily drop support for most languages.

Then depending on exactly how popular a language must be to be supported you could end up breaking language support for quite a lot of them.