|
> and then have the template system HTML-escape when outputting HTML, or properly escape JSON when outputting JSON and JavaScript Or, stop using stringly template systems, and treat the data as what it is: a structured language, with well-defined grammar. One of these days I need to write an article titled "Don't play with escaping strings. Serialize output.". Core idea being, "escaping your output" still looks too much like "sanitizing input"[0], and one tiny mistake is all it takes to give an attacker ability to inject arbitrary code into the page (or give an unlucky user ability to brick the page for themselves) - so instead of working in "string space", work in whatever semantics your output is, and treat the string form as a serialization problem. In case of HTML, that means constructing tree of tags as data structure, and then serializing them. Then, bugs in serializer notwithstanding, the whole class of injection problem disappears - you can't do "<h1>$text</h1>" -> "<h1></h1><script .... </h1>", when your "template" is made of data structures like [:h1, $text], because $text can't possibly alter the structure here. Etc. In some sense, "Don't escape, serialize instead" is the complement of "Parse, don't validate". (See also: make invalid states unrepresentable.) -- [0] - Who ever sanitizes input? I've only ever seen this kind of sanitization the article describes happen in the output, within string-gluing templates. |