Hacker News new | ask | show | jobs
by Navarr 4411 days ago
I remember seeing this on /r/PHP, and one of the top comments there was about it using Regex instead of parsing it like a language.

However, I also recall that it's thanks to using regex that it works so quickly. So I figured I'd get this argument out of the way before someone else brought it up.

3 comments

Well, the original markdown.pl heavily uses regexps.

From having tilted at this windmill a little myself, I think:

1. It's tricky enough to handle correctly all the under-specified corner cases of basic markdown -- not to mention the popular extensions to it. The cognitive load of doing it with complex regexps gets heavy, quickly.

2. I'm incredibly impressed with all the work that John MacFarlane has put into the problem, for example in [Pandoc] and [Cheapskate].

[Pandoc]: https://github.com/jgm/pandoc

[Cheapskate]: https://github.com/jgm/cheapskate

I think semantics parsing with lexer/tokens is better for a lot of things but it sometimes overkill when the patterns are predictable and simple.

That said, has there ever really been an issue with speed as it pertains to markdown translation? I can't imagine it's an everyday, practical concern.

> That said, has there ever really been an issue with speed as it pertains to markdown translation?

Yes, speed of translation is a big deal. I tried at least 4 Markdown parsers for Python precisely because I needed the right combination of speed and extensibility. When you are constructing a very large static site, a full rebuild can take a long time.

For those wondering, I went with Mistune (http://mistune.readthedocs.org/en/latest/). It is accelerated by Cython.

As with most tech, if there such a leap in speed (about 10 times) then a lot of other applications become possible. You could remove a layer of caching because its not needed anymore, thus reducing your app complexity. But apart from that, imagine how many places use markdown? If people all move to a 10 times faster implementation, that an incredible reduction in wasted cpu cycles.
My point was not that we shouldn't work to produce even small efficiencies (which, yes, cascade into larger aggregate ones).

It was more wondering whether speed in markdown parsing is such a concern that this would merit a marquee 'selling' point.

If you're building a static site from markdown files, and your site consists of thousands of pages, speed will definitely be a concern.
Not all that often, unless either a) you're in the habit of frequently making broad changes, or b) your build tool doesn't take account of modification times.
Can you parse HTML with regex?

http://stackoverflow.com/a/1732454