Hacker News new | ask | show | jobs
by meelooo 4277 days ago
You are describing the problem exactly: it all too tempting to pass text around from user input to command line arguments without any way to validate the text data and assume it's ok because it's easy. It's exactly the same arguments that goes in between static and dynamic typing in programming languages: static typing ensures some sort of semantics is respected. If you pass text around, because it's easy and fast, most of the time you will never validate the data and you have no way to ensure that you are not actually handling a bomb. If the protocol was binary there is no way in hell you would be tempted to pass it's data without validation to an external program because you'd have to respect the API and because there would be not way to just send a bunch of commands. The same goes for sql injections, url buffer overflows, etc. Free form text should only be used for actual human textual data and should NEVER be the interface in between programs. It's way too fuzzily defined to serve as a protocol.
2 comments

If the protocol was binary there is no way in hell you would be tempted to pass it's data without validation to an external program because you'd have to respect the API and because there would be not way to just send a bunch of commands

I assume you're talking about Apache - but Apache had no way of validating the data. The protocol just said "this is a blob from the client", which any binary protocol for the task must be able to handle. Apache had no business validating it, anymore than it should validate any other content - how should it know what makes it valid?

Bash, on the other hand, just received that blob and treated it as an executable. It wouldn't matter if the protocol between the server and bash was binary, since it was a valid value as far as the protocol was concerned.

The problem here is the hidden channel between Apache and bash, which never actually talk directly to each other (it's through the CGI binary) but still pass data. It has nothing to do with text protocols.

no, the problem is that you can treat any kind of text data as an executable. You can try to fix this by adding mountains of complexities and excuses but would still be true: as soon as you have text enter the equation you need to escape/encode/decode and parse. Every time you do that you add more complexity than is needed, and also you add many ways to abuse the programs and create "interesting bugs".
I can craft malicious binary data just as easily to execute a function if you execute binaries that begin with a few magic bytes when you're reading input into a buffer.

You seem to be relying on some assumption that you have about human psychology for your security gain. Somehow people would never do that with a binary protocol, and text protocols make them more comfortable and trusting. At least they can read text protocols directly; binary protocols involve me trusting a bunch of middleware I'm using to read them, too, or writing my own (always great for security.)

no, I rely on the fact that any version of an "eval" function should just no exist and that any text based protocol encourages the existence of such functions that can execute whatever is thrown at them, just because it sounds so easy and a quick shortcut in API design.