|
Neither of your examples actually replaces the {0} in "context" with "some context", so they are not achieving the desired output (which would have been `{"context": "some context", "input": "{0} % $0 -- DROP TABLE \\\"USERS\\\""}`). They are equivalent to my "safe but wrong" examples. The point here was that you may want to "template-ize" the produced JSON for whatever reason. Also, DoS is a legitimate concern, and of course using a safe language makes other consequences less dire. I wasn't using sprintf() in my example for nothing. > Naively printing potentially malicious input to places like the console is still quite dangerous, no matter how much you escape it! Logs can be captured into systems that then paste it directly into HTML. This is exactly my point: when you include untrusted input into another string, even if you escape the untrusted input correctly for the desired format of that string, the entire output is now untrusted, and generally can't be further processed safely. Yours is a perfect example: you can escape the user input to make sure it is formatted safely, but you can't at this point tell in what other ways it should be escaped for other systems that may process it (for example, even printing to the actual console like this may be unsafe, as the user input may include terminal control characters). Even this problem is still simple if we can assume that, say, anything printed to the console that later needs to be displayed in HTML should be considered an HTML string - it just becomes a simple responsibility of the log collector to properly escape the log lines as HTML content. The problem is much harder if the intention is to actually control the HTML output through log lines (say, adding new-lines through br in your log statements, or emphasis or whatever). If that is a necessary component of your system, you need to re-architect this so that log lines themselves are no longer simple strings, but are structured so that any user-controlled input is kept separate from the trusted application-control formatting Say, instead of logging error: user entered "some|||string<script\>alert('pwned')</script>" which is not a valid number<br>starting over
you would log error: user entered %s, which is not a valid number<br>starting over ||| some\|\|\|string<script\>alert('pwned')</script>`
and the log collector would need to know to recombine it into the original string, escaping the untrusted part as needed, before outputting it to HTML as error: user entered "some|||string<script>alert('pwned')</script>", which is not a valid number<br>starting over
Edit: interestingly, I had to use <script\> instead of a normal opening script tag, as HN would give me a TLS error if the comment contains the normal opening script tag...Wonder if there is some input sanitization going on here as well. |
Your example where you use a sprintf format-string placeholder inside an incomplete JSON snippet ought never be used! Ever.
It's not needed in practice. Construct the object graph and insert the parameters there (unescaped!) and then serialize the whole thing.
E.g.:
This is fine.But as I was saying, the "even better" solution is to not serialize this into JSON and then "work with the string representation". The use of JSON[1] should be a detail transparent to 99% of the application. You should be able -- safely -- to switch it out for XML, gRPC, Cap'n Proto, or whatever.
Going back to my logging example, it ought not to matter what wire format something like OpenTelemetry uses. You should be able to use "rich" object graphs in logging calls, and then let the library figure it out. E.g.:
Ideally, everything should treat this as the native "object" graph whenever the developer interacts with it. Only the "edges", such as RPC serialization or deserialization needs to deal with encoding, at which point it'll need to use exactly one escaping/encoding format and not have worry about nested encodings at all.[1] A mistake in the design of JSON is that it appears to be "simple", so beginner programmers think they understand it and can work with it "directly" using string manipulation. This is ill-defined at the best of times, and downright unsafe in surprisingly common scenarios: http://seriot.ch/projects/parsing_json.html