Hacker News new | ask | show | jobs
by danielparks 4277 days ago
I agree with all your points.

I think the real bug is that all this stuff calls out to a shell at all. Sure, it's convenient, but it's basically eval().

2 comments

There are two things to differentiate, in my oppinion.

In most cases, the shell is just used to find programs in the PATH when a C programmer uses system(). And for that case, which is probably 99% of the time when /bin/sh is being invoked, it would make perfect sense to implement this with something that exhibits less attack surface.

Taking the "dhcp-exploit" as an example (set a DHCP option on your server to "(){...}; exploit;"), I think it's less clear: Implementing the functionality of updating configuration files according to the DHCP options sent is a prefecty reasonable place to use a script written in sh/ksh/bash! It's easy to implement by any sysadmin, works very reliably with a little care, and performance-wise it's not critical at all.

And regardless of the language you implement it: There's some place where user-input has to be sanitized, but up to now, it was considered common knowledge that arbitrary data in an environment variable is safe as long as the variables' name adheres to some convention (prefix them all with PROGNAME_...). And bash doesn't respect this convention by looking at variable CONTENT, even though I'm pretty sure that it was already established when the bash-project started... (see, for example, handling of "special" variables like LD_xxx in suid programs or the dynamic linker)

> when a C programmer uses system()

I said it in another thread but this is almost always a mistake. The execve family is much less ambiguous about what gets passed to the program. Using it avoids this type of bug by not putting the shell where it doesn't need to be.

And it's not limited to C. E.g. I would be in favor to remove os.system from Python (in favor of subprocess.call). The `-syntax (backtick-syntax) in Ruby is particularly evil. It's so convenient because it is so concise, but I guarantee you that it is the source of a lot of vulnerabilities. It should be removed ASAP. I think that's kind of a theme in Ruby: is it convenient? Then put it in. But I would have expected more from Python.
subprocess.call is also vulnerable to this, though. It calls out to bash.
Vulnerable to what? The the environment variable problem? I was talking about program argument parsing. os.system("ls %s" % foo) != subrocess.call(["ls",foo])
Ah, I misunderstood then. I agree with you on that point. I assumed you were talking about "Shellshock".
I believe you would need to explicitly pass shell=True for that though.
Nope, it's not necessary. Test it with a vulnerable CGI app and call:

subprocess.call(["date"])

Or if bash is not your default shell:

subprocess.call(["bash", "-c", "date"])

Did you read the next sentence?

> And for that case, which is probably 99% of the time when /bin/sh is being invoked, it would make perfect sense to implement this with something that exhibits less attack surface.

I did. I did not find it explicit enough. There was no specific recommendation, for example. Moreover seeing the phrase "when a C programmer uses system()" is pretty jarring. There aren't enough warnings you can add to that to convey how much this gets misused and what a bad idea it usually is.

To me, use of system() is very indicative that you need to find another C programmer. There are few other answers to complete the phrase "when a C programmer uses system()".

Well... that's a pretty drastic reasoning, leaving aside all weighting of facts. Does it also apply to a Haskell programmer running System.Process? ;-)

The fact is: system() and all it's relatives (popen comes immediately to mind, there are doubtlessly 100 others) have been used, will be used, by 'incompetent' programmers[+] and as long as no other method is as widely established (and: even taught in introductory textbooks), we better provide a workaround that closes most of the holes.

[+] or just programmers weighting the merits of having a parser supporting variable and home-directory expansion, curtesy of /bin/sh -c right built in, which is completely adequate for many tasks. And yes, I know the limitations of it, and would not use it myself most of the time.

Yes, it isn't that hard to use exec*() to execute a single program, but it gets rather messy if you want to execute a series of piped commands.

Also another function to worry about is popen().

Look what I made you: https://github.com/panzi/pipes
How is it different to set some environment variables and then call out to a shell script, versus to set some environment variables and then call out to a perl script, or a binary compiled from C?
It's not. It's the "calling out" part that is wrong.

You should never call out to anything by passing untrusted user input directly. Any information that came from the outside must be explicitly passed as data through proper serialization mechanisms.

For instance, you don't piece your SQL queries by concatenating strings. You use an abstraction layers, in which you code the query structure and you pass user input as data. There is this extra step of saying "this is data, not code" that strips the external input from executability.

(for the same reasons, if your templating engine is just concatenating strings and not building the page out of trees, you're doing it wrong, but it's a topic for another day)

It's a problem you get when you believe in "the Unix way" a bit too much. Yes, everything is text, but no, not everything has the same semantics.

> You should never call out to anything by passing untrusted user input directly.

So if I call a CGI script with parameters foo=bar, what data should apache pass to the handler, if not something along the lines of the string "foo=bar"? When I pass the header "User-Agent: baz" and the handler asks for the user-agent, what should it be told if not "baz"?

Environment variables are data, not code. When apache executes a cgi script, whether it's C or perl or shell, it makes the user input available as data in defined locations.

There's a bug in bash which causes some of that data to be executed, but there's no way to protect against that class of bug.

This isn't a case of "you should have protected against sql injection attacks". It's a case of: there is a bug in your sql server, such that the query "select from Users where username='rm -rf /'" will execute "rm -rf /"*.

The point of the OP is that if a program has chosen bash to be handler of untrusted user data, then the program has made the wrong choice, because bash is clearly (hindsight!, I'm not claiming I wouldn't have made the same choice) not designed or that purpose. A handler for untrusted user data should be a program specifically designed for that purpose, which should receive the data directly.

Similarly, if a Ruby or Perl script decides to call out to bash with untrusted user data, it's their mistake to trust bash with it, not bash's mistake that it wasn't designed for that use case.

It's perfectly possible to protect against this attack: don't call a generic program with untrusted user data.

So ruby and perl are specifically designed to be a handler of untrusted data?

How do I know what other programs are designed for such a task? What's a "generic program"? At this day and age, it is expected that pretty much all software ought to be designed with security in mind (not that it always is). Because any piece of "generic software" (or just software) is otherwise going to be exploited. Especially on platform where double-clicking a file is the expected way to open it.

More importantly, the point we are making is that we're not expecting bash to "handle" anything. It gets some data. It's not supposed to do anything with it on its own. Period.

> So ruby and perl are specifically designed to be a handler of untrusted data?

Perl actually is when used in taint mode. http://perldoc.perl.org/perlsec.html