Hacker News new | ask | show | jobs
by tie_ 1652 days ago
Please stop suggesting that SQL Injections are a sanitization problem. They are a problem of escaping in the application constructing the query, not a problem with whatever the user has typed in.

The log4j vulnerability shares some common traits, but it's completely different at the core. It's not a problem of sanitization, and I'd argue it's not a problem of escaping either (though escaping could fix it). It's a problem with an obscure feature that is left with insecure defaults, and is (was?) unknown to the vast majority of developers who integrate the library in their apps.

6 comments

Part of the problem is (AIUI) that expansion of ${} type constructs is carried out after parameters are inserted into the log message's format string. I don't believe the use case requires that, i think it was just easier to implement. If that expansion was done on the format string, before parameters were inserted, this attack would not be possible. Injecting dodgy strings into log messages as archi42 suggests would also not be a problem if the insertion was done safely.

Agreed that this is not a sanitisation attack, but i think that means it is a format string attack:

https://owasp.org/www-community/attacks/Format_string_attack

It requires a couple of other loopholes, related to JNDI, to work, but that is the first step which goes wrong.

Hah, linking to OWASP is a good point. I have now put some more thought into "how would I EXACTLY classify this, if I had to list it as a finding for a customer".

Now, I think format string is not entirely right, either. That's more related to `printf`, which splits code (the format string) and data (the varargs). But log4j actually mixes code and data into one string. In the OWASP frame work, I think the more generic variant of format string attacks would be https://owasp.org/www-community/attacks/Code_Injection (that's actually linked under "related" for the format string attacks).

OWASP lives mostly in the web world, but links to CWE-77 (Command Injection, https://cwe.mitre.org/data/definitions/77.html), which is pretty generic. And log4shell matches the description just nice: 1. Data from untrusted source: yes; 2. data is part of a string that's executed: interpreted, which I think qualifies as a yes; 3. the execution gives capabilities the attacker would not have other wise: oh yes!

So it's probably safe to claim that this could fall under CWE-77 (too bad CVEs rarely use CWEs).

Now, CWE-77 (Command Injection) is a child of "CWE-74: Improper Neutralization of Special Elements in Output Used by a Downstream Component ('Injection')" https://cwe.mitre.org/data/definitions/74.html. Which states "The most classic instantiations of this category of weakness are SQL injection and format string vulnerabilities." - that's probably why the both of us thought of the the two of these! :) And even if CWE-77 is to specialized (this could be argued) CWE-74 should be a good match, quoting CWE-74: "The software constructs all or part of a command, data structure, or record using externally-influenced input from an upstream component, but it does not neutralize or incorrectly neutralizes special elements that could modify how it is parsed or interpreted when it is sent to a downstream component" [end quote].

Funnily, CWE-74 is a child of https://cwe.mitre.org/data/definitions/707.html - "Improper Neutralization". Which says neutralization can be done by (among others): "[...] transformation of the input/output to be "safe" using techniques such as filtering, encoding/decoding, escaping/unescaping, quoting/unquoting, or canonicalization [...]"

:) Thanks for triggering me on this.

I disagree with you: IMHO "escaping user input in the application constructing the query" is one possible form of sanitization. Why do you think it is not?

Also I said "similar class", not "strictly equivalent". Both instances are a case of "user input handled as code, not as data". Now with SQLi the problem is widely known (but sadly still happens) and we have techniques like parameter binding to outright avoid the problem. The same techniques are among the <edit>theoretical</edit> possibilities to avoid the problem with log4j.

Though I agree, the problem are the insecure defaults (JNDI enabled) and that seemingly every java dev used log4j but wasn't aware of the "${}"-expressions. I just pointed out that this "trend of neglect" [sry, non-native speaker here<edit>, but I hope you get what I mean?</edit>] seems to continue: Everyone and their CISO is now aware that attackers can abuse JNDI and the whole mitigation recommendations (<edit>e.g.</edit> remove JDNI classes) so far was primarily focused on that. Still, even if JNDI wasn't a thing, the "${}"-expressions can still be abused (it's just more specialized now and not a huge pwn fest), requiring mitigations beyond disabling JNDI. Just look at the reporting for more examples of ignorance: When this started to explode, there were some who thought blocking the string "${jdni" in their WAF would be enough (and maybe it helped against script kiddies), but a smart attacker just used/uses a string like "${${env:randomstring:-j}dni[...]" defeating the simple filters...

Yeah, people should stop talking about sanitation in relation to SQL, because that is how we end up with data like "OConnor". The two correct solutions are escaping and parametrization/prepared statements (i.e. sending the parameters out of band).
Huh? I understand "sanitation" to mean "a transformation of data that makes the data safe for use in the subsequent program". Escaping is one way of doing that transformation, and it's good because it's not lossy.

Another example: A "transformation" that's a "sanitation" but not "escaping" would be replacing all occurrences of "<" with "&lt;" (among others!). It surely doesn't add escape characters (e.g. \), but instead replaces the problematic substring with a replacement string that makes the string safe to display on a website. Of course you'll want to replace user-supplied "&lt;" with "&amp;lt;".

(btw, thanks for that it's "sanitation" and not "sanitization" ^^).

I've seen "sanitization" used to often mean filtering or removal of some of the input
Not sure I agree, for me sanitation has a strong connotation with removing things. And sending data out of band from code on the other hand cannot be seen as a form of sanitation. And is my preferred method of solving this issue.
You're essentially saying "sanitation" equals "filtering". Looking at CWE-707 (https://cwe.mitre.org/data/definitions/707.html) I'd rather say that "sanitation" is what MITRE calls [begin quote]transformation of the input/output to be "safe" using techniques such as filtering, encoding/decoding, escaping/unescaping, quoting/unquoting, or canonicalization[end quote] (well, I'm repeating myself here).

Now a Google search finds instances where sanitization/sanitation also includes techniques beyond filtering: https://www.webopedia.com/definitions/input-sanitization/ https://hack.technoherder.com/input-sanitization/ https://developer.wordpress.org/plugins/security/securing-in... https://stackoverflow.com/questions/129677/how-can-i-sanitiz...

But there are also results where it isn't really clear, or where the only sanitation technique considered is filtering. So I'd say "yeah, it's unclear and poorly defined".

Buuuuut: I still like my definition more, as I have a word for "all techniques that aim to make an input safe for processing" (sanitation/sanitization) while I can still refer to "destructive elimination of substrings" as just "filtering", which is a again different from outright "rejection of input" by using an "allow list" or "deny list". :P

I agree that splitting data and code is the way to go, if that's an option. But I didn't talk about that in the post you're answering to, so I'll ignore that ;-)

I think, generally, sanitization means protecting against potentially malicious input. Whether it takes form via escaping or removal or some other remediation is beside the point.
I'd say sanitization implies that there's something wrong with the data and you need to clean/fix it. But the former isn't true and the latter sounds like removing or irreversibly remapping characters.
> They are a problem of escaping in the application constructing the query

This is a form of sanitization.

> Please stop suggesting that SQL Injections are a sanitization problem. They are a problem of escaping in the application constructing the query, not a problem with whatever the user has typed in.

How are they different?

I would normally assume "sanitization" means "escaping"... as far as I'm concerned, anything other than escaping is incorrect sanitization!