Hacker News new | ask | show | jobs
by mikeash 4687 days ago
I'll move my reply here, then.

You make good points, and I'm trying to figure out exactly how your points and mine hook together.

I think what it ultimately comes down to is distinguishing between data and code. More specifically, data can contain code, but code should never contain (foreign) data. So escaping in general is fine, but escaping in this context, where you're building a string of code, is no good.

That, I think, is why I'm comfortable with e.g. the escaping done to a string when emitting JSON, but not with escaping a string to put it into an SQL query.

With parameterized queries, you're still ultimately passing the code and data over to the database over some sort of stream (assuming an out-of-process database server), but hopefully that stream is designed as data containing both the parameterized SQL code and the parameters, rather than just SQL code with escaped parameters.

1 comments

Doesn't the database server just escape the data in the parametrized query? So if you used the database's escape functions directly wouldn't it be the same thing? I guess if you are arguing that developers are more likely to mess it up than the database server then I see your point.
Not usually. What usually happens is something like this:

    SELECT * FROM bar_table WHERE x = 123;
Is converted into a datastructure like this in the database engine (really simplified):

    {
      "querytype": "select",
      "fields": ["bar_table.x", "bar_table.y", "bar_table.z"],
      "tables": ["bar_table"],
      "where_filters": ["x", "=", 123]
    }
Which is then converted into a list of things to do to run that query.

Now, if you give it a prepared statement:

    SELECT * from bar_table WHERE x = ?;
The database engine can convert that into a closure, something like this:

    function(arg1){
      return {
        "querytype": "select",
        "fields": ["bar_table.x", "bar_table.y", "bar_table.z"],
        "tables": ["bar_table"],
        "where_filters": ["x", "=", arg1]
      };
    }
Then when you run the prepared query, it can simply call the closure, putting the right value in the datastructure without escaping anything, and continuing on from there.

Of course, the realities of it are more complicated than that for optimization reasons, but that's the basis of it.

Makes sense. Thanks for the explanation.
I sure hope not. If it does, then I'm going to be very sad, and my opinion of humanity will be ever so slightly lowered.