| > They're all unsafe, because you have no clue what context they're going to be used in. That's correct, but it's the reverse thinking from the escaping one. Because in the escaping one, when you need not to escape you will also not-escape at the last possible moment, and that's a sure-fire way to launder attacker-controlled data. Instead you should escape everything, and opt-out as early as possible. > But if you're writing a webapp, passing around escaped strings is a bad idea 99% of the time. It creates code highly coupled to one aspect of your system. That's why you do the reverse: most strings are unsafe to everything, but the strings which are safe are generally safe to one specific subsystem. So you say that. > Just imagine if you did this with networking. I'm glad we're not in a world where we're passing around TCPString or UDPString or IPString or EthernetString or TokenRingString or CarrierPigeonString because that happens to be a networking stack the app uses sometimes. It sounds like hell. It sounds like hell because it makes no sense, there's no such thing as a TCPString because TCP is not string-based and TCP messages are not composed that way. |
That’s not even remotely workable for any system with more than one kind of “escaping”. What if I want to use a string as:
1. An IDNA-encoded domain name
2. An HTML text snippet
3. A shell command string argument
4. A string literal part of a regular expression
5. A part to be used in an XML CDATA section
6. A JSON string
I can’t escape the string beforehand, since the escaping rules are all different. No, the only sensible alternative is to use the same rule which we all use for character encoding: Encode and decode (and escape) at the edges.