| > difference between a recogniser Please note that the term "recogniser" is very fuzzy, it could mean a regex matcher, a parser or even a turing-complete thing. Not very helpful for this discussion. > a parser, which outputs a data structure Please note that a parser is not required to output a data structure. In classic computer science, the parser of a context-free grammar usually has a minimal (boolean) output: it just either accepts or rejects the input. If your "recognizer" is too weak (e.g. regexes), you risk not properly checking the language (see below). If your "recognizer" is too powerful (e.g. turing complete), you risk tons of loopholes which are hard to find and hard to analyze. You probably won't be able to prove the security, and even if you do, it will probably be hard work, and even harder for others to follow and to verify. But if your "recognizer" is a parser, you have a good chance to succeed in a safe way with minimal effort. Proving security is as simple as comparing your grammar with the ECMAscript standard. > you can get away with something like a regex to do the job [...] with the crazy things JS lets you do a recogniser for a relaxed language is likely to still let potentially dangerous code though That's exactly my point: Sure, you can try to build a protection wall based on regexes, but there's no reason to do that. Use a proper parser right away and don't waste your time with repeating well-known anti-patterns. |
All of the literature I remember from my uni days on formal grammers had a recogniser defined as something that accepts/rejects, and a parser as something that builds a data structure.
It's difficult to retrospectively find the literature, because outside of formal grammars recogniser _is_ used more loosely. But a few Wikipedia articles [1] [2] [3] and their referenced literature [4] [5] do agree with me.
> A recognizer is an algorithm which takes as input a string and > either accepts or rejects it depending on whether or not the string > is a sentence of the grammar. A parser is a recognizer which also > outputs the set of all legal derivation trees for the string.
[1] https://en.wikipedia.org/wiki/Parsing#Computer_languages
[2] https://en.wikipedia.org/wiki/Earley_parser#Earley_recognise...
[3] https://en.wikipedia.org/wiki/CYK_algorithm#Generating_a_par...
[4] http://reports-archive.adm.cs.cmu.edu/anon/anon/usr/ftp/scan...
[5] http://bat8.inria.fr/~lang/papers/harder/harder.pdf