| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Navarr 4411 days ago
	I remember seeing this on /r/PHP, and one of the top comments there was about it using Regex instead of parsing it like a language. However, I also recall that it's thanks to using regex that it works so quickly. So I figured I'd get this argument out of the way before someone else brought it up.

3 comments

greghendershott 4411 days ago

Well, the original markdown.pl heavily uses regexps.

From having tilted at this windmill a little myself, I think:

1. It's tricky enough to handle correctly all the under-specified corner cases of basic markdown -- not to mention the popular extensions to it. The cognitive load of doing it with complex regexps gets heavy, quickly.

2. I'm incredibly impressed with all the work that John MacFarlane has put into the problem, for example in [Pandoc] and [Cheapskate].

[Pandoc]: https://github.com/jgm/pandoc

[Cheapskate]: https://github.com/jgm/cheapskate

link

nkozyra 4411 days ago

I think semantics parsing with lexer/tokens is better for a lot of things but it sometimes overkill when the patterns are predictable and simple.

That said, has there ever really been an issue with speed as it pertains to markdown translation? I can't imagine it's an everyday, practical concern.

link

chrismonsanto 4411 days ago

> That said, has there ever really been an issue with speed as it pertains to markdown translation?

Yes, speed of translation is a big deal. I tried at least 4 Markdown parsers for Python precisely because I needed the right combination of speed and extensibility. When you are constructing a very large static site, a full rebuild can take a long time.

For those wondering, I went with Mistune (http://mistune.readthedocs.org/en/latest/). It is accelerated by Cython.

link

seer 4411 days ago

As with most tech, if there such a leap in speed (about 10 times) then a lot of other applications become possible. You could remove a layer of caching because its not needed anymore, thus reducing your app complexity. But apart from that, imagine how many places use markdown? If people all move to a 10 times faster implementation, that an incredible reduction in wasted cpu cycles.

link

nkozyra 4411 days ago

My point was not that we shouldn't work to produce even small efficiencies (which, yes, cascade into larger aggregate ones).

It was more wondering whether speed in markdown parsing is such a concern that this would merit a marquee 'selling' point.

link

oneeyedpigeon 4411 days ago

If you're building a static site from markdown files, and your site consists of thousands of pages, speed will definitely be a concern.

link

aaronem 4411 days ago

Not all that often, unless either a) you're in the habit of frequently making broad changes, or b) your build tool doesn't take account of modification times.

link

coolj 4411 days ago

Can you parse HTML with regex?

http://stackoverflow.com/a/1732454

link