| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by roop 4348 days ago

True, we can write a CFG that can accept any input, but not one that can parse Markdown.

Actually, I should have said it's not possible to write an _unambiguous_ CFG for Markdown.

Say we need to parse emphasis in span elements. "_a_" is em and "__a__" is strong, but "_a", "a_", "__a" and "a__" are normal text. If we write the rules for all these, we end up with a grammar than can generate the same string in many different ways. To determine whether an "_" is the syntax qualifier of an em or just part of normal text, we might have to look ahead an arbitrary number of characters, and potentially till the end of the input. This is why it's not possible to write a useful (or unambiguous) CFG for Markdown, and this is because of the requirement to not throw an error on any input.

> Also, isn't there a compromise between HTML's crazy > parsing strategy and a CFG?

PEGs have been written for Markdown and they work because PEGs are inherently unambiguous, but use backtracking instead. But those PEGs don't handle nested blocks cleanly.

My own HTML5-ish Markdown spec (http://www.vfmd.org/vfmd-spec/specification/) is not as crazy as HTML5's, but admittedly, is not trivial to implement either.

1 comments

kaoD 4347 days ago

Interesting reply.

> But those PEGs don't handle nested blocks cleanly.

What's the problem exactly?

roop 4346 days ago

With PEGs, it's not possible to express nested content, so we have to collate for example the blockquoted content of blockquotes and process them separately. Per my understanding (and I might be wrong here), part of the problem is that PEG includes tokenization, so it's not possible to handle blockquote-levels in the tokeniser separately.

More details on PEG for Markdown: - from the author of the PEG grammar, who's also the spec-writer of "Common Markdown": http://talk.standardmarkdown.com/t/standard-markdown-formal-... - from myself: http://www.vfmd.org/introduction/#prior-work

Also, I wrote an expanded explanation of essentially what I said about CFGs and Markdown: http://roopc.net/posts/2014/markdown-cfg/