|
|
|
|
|
by kaoD
4305 days ago
|
|
> But it's not possible to write a CFG for Markdown because of Markdown's requirement that anything is valid input. Honest question here: how do CFGs prevent you from parsing anything as valid input? E.g. AFAICT this CFG in BNF accepts anything as valid input (including no input!) <s> ::= <x> EOF
| EOF
<x> ::= CHAR <x>
| CHAR
Isn't it just a problem of crafting a proper grammar which covers all cases? I can see why HTML needs other parsing strategies, but it should be easy for Markdown since everything that is not a non-paragraph is a paragraph (or a blank line, i.e. a paragraph separator).Also, isn't there a compromise between HTML's crazy parsing strategy and a CFG? A formal grammar, even if not context-free. |
|
Actually, I should have said it's not possible to write an _unambiguous_ CFG for Markdown.
Say we need to parse emphasis in span elements. "_a_" is em and "__a__" is strong, but "_a", "a_", "__a" and "a__" are normal text. If we write the rules for all these, we end up with a grammar than can generate the same string in many different ways. To determine whether an "_" is the syntax qualifier of an em or just part of normal text, we might have to look ahead an arbitrary number of characters, and potentially till the end of the input. This is why it's not possible to write a useful (or unambiguous) CFG for Markdown, and this is because of the requirement to not throw an error on any input.
> Also, isn't there a compromise between HTML's crazy > parsing strategy and a CFG?
PEGs have been written for Markdown and they work because PEGs are inherently unambiguous, but use backtracking instead. But those PEGs don't handle nested blocks cleanly.
My own HTML5-ish Markdown spec (http://www.vfmd.org/vfmd-spec/specification/) is not as crazy as HTML5's, but admittedly, is not trivial to implement either.