|
|
|
|
|
by mjn
3957 days ago
|
|
There is a conceptual version of HTML that's non-regular, but HTML as used in practice is regular, because the standard allows parsers to impose a maximum nesting depth, and all browsers that I know of do so (e.g. Webkit limits to 512). Every finite language is regular, so HTML with a finite tag-nesting depth is regular. Therefore you could in principle parse the subset of HTML recognized by Webkit using a regular expression. It's impractical and error-prone to do so, of course, but that's not a computability issue. You could write a compiler, if you really wanted to, that took a context-free grammar plus an integer specifying maximum production depth, and mechanically converted it to an FSM or regex. Therefore I think the article is barking up the wrong tree with computability; the issue isn't what's computable by a finite vs. pushdown automaton, but that parsing nested data structures with regexes is virtually impossible to do correctly. |
|