Hacker News new | ask | show | jobs
by shawnz 2527 days ago
Exactly, sanitization is a misnomer. If you are concatenating plain text together with HTML then you have an app which is functionally broken when someone with an apostrophe in their name tries to use it -- it's not just a matter of security. The strings must be the same format (i.e both valid HTML fragments) before you concatenate them or the result will be unparsable garbage.

And the idea of "sanitization at input" is especially ridiculous: how can you know what you will be concatenating that input with until you actually do it? I.e. is it being inserted into some HTML? is It going in an attribute value or a text node? What about outputting JSON?

1 comments

Right.

This is why we typically speak about defense in depth. Input sanitization works best when applied to known expected inputs, like a phone number or dob.

Output encoding is the real solution where we know where we intend any data to end up (this is how it’s displayed) so we can ensure that it’s in the correct format and that that format parser won’t interpret it as code instead of data. Ie html attribute, html, Json, JavaScript, etc.