Hacker News new | ask | show | jobs
by ashearer 2310 days ago
This sounds good in theory, but I'll give a counterexample.

Requirement: Name input box.

Implementation: We'll sanitize the input by rejecting any characters likely to be dangerous if mishandled, like single quotes, or anything else we don't immediately imagine to be useful. If a character turns out to be needed later, that's no problem. We'll just change the list.

Security audit: Passes

Later customer complaint: I can't sign up! — J. O'Brien

Dev team: Sorry, too bad. We'd have to re-audit everything and possibly modify code to allow your last name, because there might be code somewhere that relies on the original sanitization for security. That was the point of sanitizing on input, after all. If you want to sign up, it would be easiest for us if you would just change your name.

1 comments

I think you misunderstood my point. I am not saying that you should reject valid (that is: semantically meaningful) input, but that if you are confronted with semantically meaningless input, you should reject it rather than garble it so that it gains some random meaning.

So:

Name input field, value "J. O'Brien": accept

JSON parameter, value "{foo:bar}": reject

The context was the idea that you should gracefully accept bad input. If your code considers "J. O'Brien" bad input for a name, then that's the problem, not that it doesn't accept bad input.

Yes, I completely agree in the above case. The JSON input has a well-defined format and input validation should reject it outright.

The issue is that when developers hear they should "reject bad input" in order to avoid vulnerabilities, they often interpret it as a call to reject any user input that isn't already known to be good. Since user inputs are often free text, like the name field, they wind up forbidding any input they hadn't specifically imagined, which doesn't align with any particular recipient's actual data requirement. It creates false-negative edge cases while only providing illusory help against vulnerabilities.

I mean, I generally agree, but I think it's already problematic to frame it as "user input that isn't already known to be good". Because "J. O'Brien" is known to be good. The problem is that anyone thinks in the first place that some semantically meaningful input value for some reason is not good.