Hacker News new | ask | show | jobs
by icebraining 4276 days ago
Please describe specifically how would that have helped.

This has nothing to do with parsing text. The problem here is that Apache et all send untrusted data to a process that treats it as code. It wouldn't matter if HTTP was a binary protocol and if bash read a well defined bytecode instead. I mean, look at shellcodes.

2 comments

You are describing the problem exactly: it all too tempting to pass text around from user input to command line arguments without any way to validate the text data and assume it's ok because it's easy. It's exactly the same arguments that goes in between static and dynamic typing in programming languages: static typing ensures some sort of semantics is respected. If you pass text around, because it's easy and fast, most of the time you will never validate the data and you have no way to ensure that you are not actually handling a bomb. If the protocol was binary there is no way in hell you would be tempted to pass it's data without validation to an external program because you'd have to respect the API and because there would be not way to just send a bunch of commands. The same goes for sql injections, url buffer overflows, etc. Free form text should only be used for actual human textual data and should NEVER be the interface in between programs. It's way too fuzzily defined to serve as a protocol.
If the protocol was binary there is no way in hell you would be tempted to pass it's data without validation to an external program because you'd have to respect the API and because there would be not way to just send a bunch of commands

I assume you're talking about Apache - but Apache had no way of validating the data. The protocol just said "this is a blob from the client", which any binary protocol for the task must be able to handle. Apache had no business validating it, anymore than it should validate any other content - how should it know what makes it valid?

Bash, on the other hand, just received that blob and treated it as an executable. It wouldn't matter if the protocol between the server and bash was binary, since it was a valid value as far as the protocol was concerned.

The problem here is the hidden channel between Apache and bash, which never actually talk directly to each other (it's through the CGI binary) but still pass data. It has nothing to do with text protocols.

no, the problem is that you can treat any kind of text data as an executable. You can try to fix this by adding mountains of complexities and excuses but would still be true: as soon as you have text enter the equation you need to escape/encode/decode and parse. Every time you do that you add more complexity than is needed, and also you add many ways to abuse the programs and create "interesting bugs".
I can craft malicious binary data just as easily to execute a function if you execute binaries that begin with a few magic bytes when you're reading input into a buffer.

You seem to be relying on some assumption that you have about human psychology for your security gain. Somehow people would never do that with a binary protocol, and text protocols make them more comfortable and trusting. At least they can read text protocols directly; binary protocols involve me trusting a bunch of middleware I'm using to read them, too, or writing my own (always great for security.)

no, I rely on the fact that any version of an "eval" function should just no exist and that any text based protocol encourages the existence of such functions that can execute whatever is thrown at them, just because it sounds so easy and a quick shortcut in API design.
If the protocol was binary, exporting variables to subprocesses and exporting functions to subprocesses would go in different places, and Apache would know to send the one but not the other
How, if the protocol in question - the environment variables - has no concept of functions?

The matter is that Apache and the protocols (HTTP and environment vars) are just being used as a tunnel between the attacker and bash. They can't pass functions via another channel because they don't know what functions are. All they know is they're passing blobs of data - which any protocol would do, binary or not.

Bash happens to recognize a text value as functions, but it could just as easily recognize the magic value of an ELF binary and execute that, or any other binary format used to encode functions.

The problem is that Bash is using the same channel for two quite different things - values and functions. It's doing that because the channel is a string; if there were a proper protocol for passing environment to subprocesses, that protocol would make a distinction between the two.
if there were a proper protocol for passing environment to subprocesses, that protocol would make a distinction between the two.

TCP is a binary protocol, how does it distinguish between executable and plain text formats? Answer: it doesn't, because TCP doesn't know or care about that, that's left to the layers above to handle.

Likewise, environment variables don't know or care about "functions", that's a concept that doesn't enter into the protocol, since it's not a shell specific protocol. All it transmits are keys and values, which are generic blobs of data. That bash uses the protocol to transmit code mixed up with data is no more the protocol's fault than the fact that TCP was used to transmit those same functions on HTTP requests.

In principle, a distinguishing protocol could be embedded within the undistinguished one. If the actual environment variables were preceded with a sequence indicating the type of the contents in all cases, this would not be an issue.