Hacker News new | ask | show | jobs
by praptak 4688 days ago
SQL is hard to get rid of injections and here's why I think it is so.

How would an ideal injection-free application look? I'd imagine a set of unmutable precompiled SQL statements (the code) each controlled by a set of parameters (the data.) No gluing of statements from strings at runtime and the parameters are obviously passed out-of band. You can't forget to escape user-provided data because in this setup the SQL code and the user data never mix, so escaping is not necessary.

Unfortunately is not feasible to use the above design with SQL. A simple filter with a parameter that can be unspecified (as in "don't care") would require 2 different precompiled statements ("select ... from foo" and "select ... from foo where param=<placeholder>".) The number grows exponentially with the complexity of the filter, so the runtime construction of SQL statements is inevitable.

You can go about it in a smarter or a dumber way but the danger remains.

4 comments

This paper: http://swerl.tudelft.nl/twiki/pub/Main/TechnicalReports/TUD-... describes a general way to deal with injection attacks (not just SQL injections), namely adding the syntax of the "guest" language (such as SQL) to the "host" language (such as Java, e.g.

  SQL q = <| SELECT id FROM users WHERE name = ${userName} AND password = ${password} |>;
The <|...|> construct switches from Java to SQL, while ${...} switches from SQL to Java. The desugaring process that translates the Java+SQL to plain Java ensures that escaping is done at the right places. This approach is based on SDF2, a language for defining grammars that allows context-free grammars to be combined in almost arbitrary ways.
It's not hard. It's a solved problem. People just choose to remain ignorant of these best practices.

Binding data to a query after the fact is a reliable way of escaping. Composing a prepared statement is not hard, and creating these at runtime is not a big deal.

Honestly, for most of your daily work you should be using an ORM of some kind where this is already done for you. The only reason you should be writing direct SQL is when you're doing things the ORM doesn't natively support.

Ruby on Rails is currently 100% injection free as far as anyone can tell, and if it isn't there'd be an emergency patch issued right away. There have been situations in the past where this was not the case, but these bugs were quickly addressed after being discovered. Most other frameworks (Django, etc.) are in the same category.

How about we stop calling everything "strings" (and "ints" too, while we're at it)?

  UserInput readFromPost(HttpParam param) {
    // Read param from POST here
  }
  Collection<SqlRow> query(SqlQuery q) {
    // Send query to DB here
  }

  UserInput name = readFromPost('name');
  Collection<SqlRow> result = query(
    sqlConcat('INSERT INTO users (name) VALUES (',
              stringToSql(userInputToString(name)),
              ')')
  SqlInt<10> userId = sqlRowLookup(result, 'id')
  sendToBrowser(htmlConcat('You are user number ',
                           sqlIntToString(userId)))
Clearly we can get better APIs than this, but it's not as difficult as you make out to program in a safe and sane way.
> The number grows exponentially with the complexity of the filter, so the runtime construction of SQL statements is inevitable.

You can still construct at runtime. Just let your typesystem handle the distinction between SQL code and data.

Good point, but this would require composable SQL in the typesystem, right? Maybe libraries like SQLAlchemy come close to that (or maybe are already there?) but look how long it took.
Type systems tell you when your code is wrong. You don't have to do any SQL manipulation in your type system, you just make sure that SQL and non-SQL data have different types, and carry on as normal. If your code is joining functions together in an unsafe way, the type system will reject it and you have to go back and change it. If it's safe, they type system accepts it and then the code can be compiled into some untyped* language (assembly, Javascript, JVM bytecode, whatever)

* By 'untyped' I mean that the type-system that you were using is no longer enforcing anything. There may be another type system in the compiler target.

It would actually be possible to use a type system to compose SQL; we could use this to guarantee there are no syntax errors. For example we could have a types "SqlTable", "SqlQueryType", "SqlWhereCondition", "SqlComparisonOperator", etc.

However, we don't need to do anything nearly so elaborate to stop SQL injection. We just need a type "SqlQuery", since that will be instantly unusable by all string concatenation functions. We then make a concatenation function for SqlQuery values and a "stringToSqlQuery" function (or ".toSqlQuery" method, if you prefer) which converts strings to SqlQuery values by escaping them.

This way, we've turned SQL escaping into a type coercion, so we can only pass the type checker by escaping every string we put in our queries. Note also that it solves the double-escaping problem: since escaped strings have a different type to unescaped strings, we can't send them back through the escape function; ie. "stringToSqlQuery(stringToSqlQuery(foo))" is a type error.

In a typical loosely typed language, such as php et al., just keep the query and the data separate. E.g.:

    $params = array();
    $sql = "select * from foobars where 1";
    if (isset($_GET['name'])) {
      $sql .= " and name = :name"
      $params[':name'] = $_GET['name'];
    }
    $stmt = $db->prepare($sql);
    $stmt->execute($params);