| > Because we don't know what goodness looks like. You're writing the parser, so you define the set of acceptable input. > The world can be separated into good, bad and unknown The data your software receives as input can be separated into valid input that your software will correctly interpret, or invalid input that is either and error or an attack. There shouldn't ever be any "unknown" input, as that would imply you don't know how your software parses its input. As the ccc talk in my previous [2] explains, this may be true if recognition of input is scattered across your software and thus hard to understand as a complete grammar. Thus the recommendation to put it all in one place using a parser generator (or whatever). > If you classify anything unknown as bad then anything new is DOA. Anything unknown is by definition not properly supported by the software you're writing. |
This seems to be where you're going wrong. There is no god-mode where you can see the whole universe and perfectly predict everything that will happen in the future.
Your code has to do something when it gets a URI for a scheme that didn't exist when you wrote your code. The handler for that URI is third party code. Your code can either pass the URI to the registered handler or not.
And if the answer is "not" then it will be prohibitively difficult for a new URI scheme (or what have you) to gain traction. Which means every new thing has to be shoehorned into HTTP and HTTP becomes an ever larger and more complicated attack surface.