Hacker News new | ask | show | jobs
by simiones 1326 days ago
The biggest problem appears once you start wanting to combine such strings.

Say a user inputs some raw text in a form that is intended to be the title of a button in a HTML form that will be sent in a JSON file to be stored in a SQL db. The expectation is that you can later retrieve this HTML snippet from the DB and display it on the screen.

You have to first escape the raw text from the user so that it can be safely used in HTML - so you will go from user_input_string to html_pcdata_escaped_ user_input_string. Then you compose a bit of HTML that contains the button and this part; let's say you store it in some HTML DOM object. Then you want to send this HTML object as JSON, so you have to know to convert html_pcdata_escaped_ user_input_string into json_string_escaped_user_input_string - but that loses type information which may hurt us later, so maybe we want to actually store it as json_string_escaped_html_pcdata_escaped_user_input_string. Then, if we want to use this as part of an SQL query string, by the same considerations, we want to put it in a mysql_like_filter_escaped_json_string_escaped_html_pcdata_escaped_user_input_string - which is getting really ugly, and easy to mess up.

Of course, the order of escaping matters, so an mysql_like_filter_escaped_json_string_escaped_html_pcdata_escaped_user_input_string and a json_string_escaped_mysql_like_filter_escaped_html_pcdata_escaped_user_input_string are different things that need to be decoded differently (of course, for SQL in particular we could use prepared queries instead).

Also, we can't ever concatenate this with any other string-like type until perhaps the final use point (such as sending a query string to the DB), since we need to remember which part of the string is escaped in which way, and for what types of uses it is safe (an HTML-escaped string may still contain SQLi or JSON injection).

The point is that even with proper types, this is not easy to manage or fix.

It also requires quite advanced type systems to be able to use these in normal contexts - say, you want to store several such strings with different provenances in a Map or Set or even List, without "forgetting" the provenance.

2 comments

As far as I'm concerned, "the point is" that what we're describing here -- keeping track of the type of each piece of data -- is the best way to think about this class of problems (as opposed to talking about "sanitising" or "safe data", for example).

It's then up to us to decide how to best make use of the type system of whatever language we end up implementing it in (or, indeed, to treat the ability to deal with this well as a requirement when we're choosing a language).

For me, effects like "we can't ever concatenate this with any other string-like type" are desirable features, not problems with this approach: either it's possible to convert both strings to a common form, or I shouldn't be trying to combine them.

Sure - I'm just saying that the article is right that this problem is difficult, not easy, and that it doesn't get significantly easier if we accurately keep track.
> The point is that even with proper types, this is not easy to manage or fix.

In practice, in a typed language, nothing like this ever occurs, because the rule is just: "use string for everything, except the edge".

You're thinking of a type like: HtmlString<JsonString<Utf8String>>>

In practice the type that is "passed around" is almost always just "string", and this is converted at the last moment to a single destination format, such as HtmlString.

When writing to databases, there isn't even an escape step at all, because you use parametrised queries, right? Right!?

The database stores "string", not "DatabaseEscapedString".

This is similar to how instants in time ought to be handled. You store them as UTC and convert to the user's time zone at the last moment. You don't pass around some monstrosity that somehow keeps track of +10-5+3 in order to arrive at +7. That would be absurd. Instead you pass around the "Z" UTC timestamp and add +7 when needed.

> In practice the type that is "passed around" is almost always just "string"

That's what happens in practice, of course. The GP was proposing something else, and I was explaining how complicated that gets.

> and this is converted at the last moment to a single destination format, such as HtmlString.

I explained before why this doesn't work unless we're talking about the final destination of this string. Otherwise, if that string is being taken through various encodings (say user input to JSON to sprintf format string to HTTP body), and if you need to combine safe and unsafe input, then what you're saying doesn't work anymore.

Here is a sketch of an example:

  userInput := read()
  jsonFormat := "{\"context\": \"%s\", \"input\": \"" + json.escape(userInput) + "\"}" //easy and safe
  finalJson := ""
  sprintf(finalJson, jsonFormat, "some context") 
  // oops - unsafe if original input was "%s"
  sprintf(finalJson, printf.escape(jsonFormat), "some context") 
  // oops - does the wrong thing - it will output "{\"context\": \"%s\", \"input\": \"%s"\"}
  
  //let's try the other way around?

  userInput := read()
  formatStr := "{\"context\": \"%s\", \"input\": \"" + printf.escape(userInput) + "\"}" //easy and safe
  finalJsonStr := ""
  sprintf(finalJson, formatStr, "some context") 
  // oops - unsafe if userInput was "safe-looking\", \"bypassAuth\": \"true\"}" 
  
  sprintf(finalJson, json.escape(formatStr), "some context") 
  // oops - does the wrong thing - it will output 
  // "\"{\\\"context\\\":\\\"some context\\\", \\\"input\\\": \\\"safe-looking\\\", \\\"bypassAuth\\\": \\\"true\\\"}\"" - that is, a JSON string instead of a JSON object
The only solution to get this to work is to keep the user input string entirely separate from any other string, and apply escaping to it individually at every level where it is used.

Additionally, you will need to remember what escaping has been applied to it, and in what order, so that it can be un-escaped back to the original value when needed.

You're basically running around with your finger on the trigger and suggesting that everyone everywhere ought to wear ballistic armour to compensate.

This is how you put the safety on and return the gun into its holster:

    using System;
    using System.Text.Json;
    
    string maliciousInput = "{0} % $0 -- DROP TABLE \"USERS\"";
    
    // Always, always, ALWAYS use a proper serializer for assembling formats like JSON.
    // The malicious input can include actual JavaScript, and it'll be correctly encoded with 100% safety.
    string encoded = JsonSerializer.Serialize( new {
        context= "{0}",      // .NET format string placeholder
        input=maliciousInput
    });
    
    // This will just work, formatting placeholders are ignored if no parameters are specified
    Console.WriteLine(encoded);
    
    // A safe FormatException is thrown if you mis-use the string formmating code. 
    // No vulnerability other than DDoS.
    Console.WriteLine(encoded, "adfasfd");

Test here: https://dotnetfiddle.net/p8P1fO

Fundamentally, putting any format like JSON or any user-controlled input into the first parameter of sprintf or any similar function in any language is Wrong with a capital W. It ought to be picked up in code review.

Ideally, sprintf-like functions in strongly typed languages should use a special "FormatString" type instead of a plain string as the first input. This would automatically fix any such issues, but relying on this is still problematic. Naively printing potentially malicious input to places like the console is still quite dangerous, no matter how much you escape it! Logs can be captured into systems that then paste it directly into HTML. Similarly, console control codes can be used by attackers as a nuisance. Etc... Structured logging, along the lines of OpenTelemetry is safer.

See: https://owasp.org/www-community/attacks/Log_Injection

This is the safe equivalent of your second example. Both format strings and JSON are correctly handled:

    Console.WriteLine( "{0}", JsonSerializer.Serialize( new {
     context="{0}", // if sprintf/WriteLine is not misused delibaretely, this is safe!
     input="safe-looking\", \"bypassAuth\": \"true\"}" 
    }));
This outputs:

    {"context":"{0}","input":"safe-looking\u0022, \u0022bypassAuth\u0022: \u0022true\u0022}"}
Link: https://dotnetfiddle.net/Lm8jkR
Neither of your examples actually replaces the {0} in "context" with "some context", so they are not achieving the desired output (which would have been `{"context": "some context", "input": "{0} % $0 -- DROP TABLE \\\"USERS\\\""}`). They are equivalent to my "safe but wrong" examples. The point here was that you may want to "template-ize" the produced JSON for whatever reason.

Also, DoS is a legitimate concern, and of course using a safe language makes other consequences less dire. I wasn't using sprintf() in my example for nothing.

> Naively printing potentially malicious input to places like the console is still quite dangerous, no matter how much you escape it! Logs can be captured into systems that then paste it directly into HTML.

This is exactly my point: when you include untrusted input into another string, even if you escape the untrusted input correctly for the desired format of that string, the entire output is now untrusted, and generally can't be further processed safely. Yours is a perfect example: you can escape the user input to make sure it is formatted safely, but you can't at this point tell in what other ways it should be escaped for other systems that may process it (for example, even printing to the actual console like this may be unsafe, as the user input may include terminal control characters).

Even this problem is still simple if we can assume that, say, anything printed to the console that later needs to be displayed in HTML should be considered an HTML string - it just becomes a simple responsibility of the log collector to properly escape the log lines as HTML content.

The problem is much harder if the intention is to actually control the HTML output through log lines (say, adding new-lines through br in your log statements, or emphasis or whatever). If that is a necessary component of your system, you need to re-architect this so that log lines themselves are no longer simple strings, but are structured so that any user-controlled input is kept separate from the trusted application-control formatting

Say, instead of logging

  error: user entered "some|||string<script\>alert('pwned')</script>" which is not a valid number<br>starting over
you would log

  error: user entered %s, which is not a valid number<br>starting over ||| some\|\|\|string<script\>alert('pwned')</script>`
and the log collector would need to know to recombine it into the original string, escaping the untrusted part as needed, before outputting it to HTML as

  error: user entered "some|||string&lt;script&gt;alert('pwned')&lt;/script&gt;",  which is not a valid number<br>starting over
Edit: interestingly, I had to use <script\> instead of a normal opening script tag, as HN would give me a TLS error if the comment contains the normal opening script tag...

Wonder if there is some input sanitization going on here as well.

Formatting a string twice is what lead to the Log4j security vulnerability. It had a macro language that allowed user-controlled input to have macros expanded in an unexpected place. Essentially the macro input itself could contain macros.

Your example where you use a sprintf format-string placeholder inside an incomplete JSON snippet ought never be used! Ever.

It's not needed in practice. Construct the object graph and insert the parameters there (unescaped!) and then serialize the whole thing.

E.g.:

    JsonConverter.Serialize( new {
        // Use a static format string! Never let users control this...
        context = string.Format( "{0:N1}", userControlledMaliciosInput ),
        alsoThis = "... json snippet...",
    });
This is fine.

But as I was saying, the "even better" solution is to not serialize this into JSON and then "work with the string representation". The use of JSON[1] should be a detail transparent to 99% of the application. You should be able -- safely -- to switch it out for XML, gRPC, Cap'n Proto, or whatever.

Going back to my logging example, it ought not to matter what wire format something like OpenTelemetry uses. You should be able to use "rich" object graphs in logging calls, and then let the library figure it out. E.g.:

    Log.Information( "user submitted a form", new {
        context = string.Format( "{0:N1}", userControlledMaliciosInput ),
        alsoThis = "... json snippet...",
    });
Ideally, everything should treat this as the native "object" graph whenever the developer interacts with it. Only the "edges", such as RPC serialization or deserialization needs to deal with encoding, at which point it'll need to use exactly one escaping/encoding format and not have worry about nested encodings at all.

[1] A mistake in the design of JSON is that it appears to be "simple", so beginner programmers think they understand it and can work with it "directly" using string manipulation. This is ill-defined at the best of times, and downright unsafe in surprisingly common scenarios: http://seriot.ch/projects/parsing_json.html

We're stuck discussing JSON and other data transfer encodings, which is partly my fault as I brought it up, but there are far more scenarios for using combined text encodings.

It is very common to have templating languages which include their own syntax + the syntax of a target output language (e.g. Markdown supports HTML snippets that should get output to the final HTML as is; C macros support C code snippets, and C itself supports Assembler snippets that should end up in the final binary etc). When generating/processing the mixed format from your own code, you may often hit the problems above.

Even for JSON, there are legitimate reasons for processing stored JSON documents as text, or at least situations where it seems a safe enough approach - because people tend to forget that a string representation of a JSON document that has user-controlled input should be itself considered untrusted user input in its entirety, at least unless it is parsed by a JSON parser.

Additionally, data often has to be stored to unstructured storage (e.g. disk) between the moment you receive untrusted user input and the moment you output the final format to the user - again, doing the easy thing of storing in the intermediate format with the first level of escaping of untrusted input is extremely tempting, and the alternative is significantly more difficult.