|
True, we can write a CFG that can accept any input, but not one that can parse Markdown. Actually, I should have said it's not possible to write an _unambiguous_ CFG for Markdown. Say we need to parse emphasis in span elements. "_a_" is em and "__a__" is strong, but "_a", "a_", "__a" and "a__" are normal text. If we write the rules for all these, we end up with a grammar than can generate the same string in many different ways. To determine whether an "_" is the syntax qualifier of an em or just part of normal text, we might have to look ahead an arbitrary number of characters, and potentially till the end of the input. This is why it's not possible to write a useful (or unambiguous) CFG for Markdown, and this is because of the requirement to not throw an error on any input. > Also, isn't there a compromise between HTML's crazy
> parsing strategy and a CFG? PEGs have been written for Markdown and they work because PEGs are inherently unambiguous, but use backtracking instead. But those PEGs don't handle nested blocks cleanly. My own HTML5-ish Markdown spec (http://www.vfmd.org/vfmd-spec/specification/) is not as crazy as HTML5's, but admittedly, is not trivial to implement either. |
> But those PEGs don't handle nested blocks cleanly.
What's the problem exactly?