Hacker News new | ask | show | jobs
by dspillett 4292 days ago
> Never trust user input.

Never trust any input. I think this is a case where people assume that is isn't pure user input because is would have already been parsed/checked/verified.

"Oh, its in the DNS system so it must be safe" is worse then "well, it came from our database so it should be fine". Don't even trust something coming out of your own database. You never know what various input checking bugs might have accidentally let in over time.

3 comments

Thinking about it as "don't concatenate different data types" leads to even more correct software. Concepts like "trust" and "sanitization" are too often vague and misleading. It might be perfectly valid for TXT records--even trusted and sanitized ones--to contain sequences with left angle brackets that make them look like HTML tags. Either way, that's no excuse for failing to convert the text to HTML (by escaping it) before concatenating it into an HTML page.
This is what I always try to press home to developers I work with. It's not 'sanitization', it's encoding. In order to make a web browser display the string I've retrieved from my database, I have to turn it into an HTML representation that will be displayed as that string. In order to use a string in a JavaScript string literal, I need to turn it into a JavaScript string literal which represents the string.
In addition to "never trust user input";

Never trust your program's output

You should have two sets of sanitization, one that sanitizes incoming data, and one that sanitizes outgoing data.

I disagree. Obviously data should be validated. But passing validation, I prefer to store data exactly as the user supplied it and then sanitize on output. That way you always have a copy of the original data assuming things change.
Definitely. If it is genuinely invalid, refuse it, otherwise store everything as-is. You don't know on the way in what encoding will be needed on the way out: the same sting could be output later plain, in HTML, in a JS literal, in SQL if someone is daft enough to use ad-hoc unparamerterised queries, and so forth.
This is only too true! At work we do CRUD projects, which means user input gets stored in the database. I almost always break other people's work by adding HTML tags to the inputs, navigating back to the page, and seeing markup that shouldn't be there. Even database output needs to be sanitized
Database output is application input. All forms of input need to be sanitized, period.
Same here. It is surprising how many times I've done that over the years and people are both surprised how easy it was but easily convince themselves that "it'll be all right" somehow and they'll fix it later...