| > The point is that even with proper types, this is not easy to manage or fix. In practice, in a typed language, nothing like this ever occurs, because the rule is just: "use string for everything, except the edge". You're thinking of a type like: HtmlString<JsonString<Utf8String>>> In practice the type that is "passed around" is almost always just "string", and this is converted at the last moment to a single destination format, such as HtmlString. When writing to databases, there isn't even an escape step at all, because you use parametrised queries, right? Right!? The database stores "string", not "DatabaseEscapedString". This is similar to how instants in time ought to be handled. You store them as UTC and convert to the user's time zone at the last moment. You don't pass around some monstrosity that somehow keeps track of +10-5+3 in order to arrive at +7. That would be absurd. Instead you pass around the "Z" UTC timestamp and add +7 when needed. |
That's what happens in practice, of course. The GP was proposing something else, and I was explaining how complicated that gets.
> and this is converted at the last moment to a single destination format, such as HtmlString.
I explained before why this doesn't work unless we're talking about the final destination of this string. Otherwise, if that string is being taken through various encodings (say user input to JSON to sprintf format string to HTTP body), and if you need to combine safe and unsafe input, then what you're saying doesn't work anymore.
Here is a sketch of an example:
The only solution to get this to work is to keep the user input string entirely separate from any other string, and apply escaping to it individually at every level where it is used.Additionally, you will need to remember what escaping has been applied to it, and in what order, so that it can be un-escaped back to the original value when needed.