|
|
|
|
|
by unlinkr
2475 days ago
|
|
That is just a comment and can be easily scanned by a regex. The CDATA syntax does not have any particular meaning inside an XML comment if that is what you are suggesting. And in any case, neither comments nor CDATA sections nest, so the is no problem handling them at the lexical analysis stage. As for "get a premade grammar" - what does that mean? Do you mean use a tool like Lex? AFAIK Lex uses regular expressions. |
|
and no I mean tool like antlr or javacc which build an in memory syntax tree you can query or observe.
https://github.com/antlr/grammars-v4/blob/72810b7c59bb481750...
this is different than querying the document as XML, since an empty node is equivalent to an empty element, but they differ in the syntax, so querying the syntax tree allowed you to know because it produces a TAG_SLASH_CLOSE