| Text isn't the universal interface in Unix, byte streams are. You can quite happily send non-textual control characters and such around in Unix, or pipe data containing NULLs from one process to another. 'Text' is a very seductive abstraction, but it's one of the most brutal to work with once you start interacting with the real world and have to give up on ascii and deal with encodings and unicode and so on. Putting commands and data inline is a recipe for disaster and a million command injection exploits. The Unix philosophy has broken the minds of generations of programmers. It leads them to doing things like concatenating strings to build SQL queries or doing IPC with ad-hoc regex-parsed protocols or using a couple of magical characters to indicate that the contents of a variable should be parsed and executed instead of just stored. Take a read of some of the earlier threads on HN about Shellshock, and you will find numerous people blaming Apache for not "escaping" the data it was putting in a shell variable. As if it even could. Even Unix nerds have at least partially internalised the dangerousness of the paradigm -- "don't parse the output of ls" and so on. The fact that the Unix paradigm (passing everything as strings with magical characters and escape sequences) is broken for the most fundamental computing tasks like working with file names ought to be a damning inditement of the paradigm. Sadly people merely parrot the rote learned lesson "don't parse ls because file names can't be trusted", without thinking about all the other untrusted data they expose to unix shells all the time. Just this week Yahoo got exploited. At first people thought it was Shellshock, but no, it was just a routine command injection vulnerability in their log processing shell scripts. A problem blighting just about every non-trivial shell script ever written. The usual reply is "don't use shells with untrusted data". But auditing where any particular bit of data came from can be just about impossible once it has been across several systems through programmes written in a variety of languages, stored on a file system, read back and so on. The only sane solution is to never use shell scripts. Like the C memory and integer model makes writing secure C code borderline impossible, the Unix "single pipe of bytes that defaults to being commands" paradigm makes writing secure shell scripts borderline impossible. Unix needs to be taken out back and shot. |
Yes! And then to compensate, they have to "sanitize" untrusted input to their systems. I had a meeting yesterday with a developer and a project manager at an organization that wants to work with my company to integrate one of our products with one of theirs. I mentioned the possibility of submitting some data in JSON format to a web API on their end, and the project manager asked about the risk of code injection attacks, by which he apparently meant SQL injection. I had to assure him, based on my knowledge of their tech stack (Node.js, CouchDB, and naturally, JSON) that code injection wouldn't be an issue. My point is that the common abuses of strings by Unix and web developers have led to well-known and widely feared security vulnerabilities which just don't exist in software that's built on a foundation of properly structured data.
See also this classic by Glyph Lefkowitz:
https://glyph.twistedmatrix.com/2008/06/data-in-garbage-out....