Hacker News new | ask | show | jobs
by Illniyar 3647 days ago
As always just use parameterized queries. This is the best and only defense you need against sql injection.
2 comments

This is absolutely good advice, but people giving it rarely acknowledge that it isn't always possible.

A particular irritant is that most (all?) db APIs make it impossible to use parametrized queries with IN clauses. I find myself having to implement string escaping functions sooner or later in every project, because of stuff like that.

That's when you combine parameterized queries and concatenated SQL:

    sql = '... col_name IN ('
    foreach(var arg in args)
      if (!first)
        sql.append(', ')
      sql.append('?')
      params.add(arg)
    sql.append(')')
> A particular irritant is that most (all?) db APIs make it impossible to use parametrized queries with IN clauses

Psycopg2 has a generic adaptation of iterables to a form suitable for IN, but it's hand-rolled: https://github.com/psycopg/psycopg2/blob/732ea90a4ff85f6d0cc...

Real convenient though.

This is such an irritant. Everyone supports variable size IN clause in the form of a sub-select AFAIK. You should be able to abstract the parsing and have an api to address that. Yet for years its missing and you end up all sorts of garbage under-performing code.

Our current solution is to use of temporary table because other solutions (including dynamically adding parameter in query) makes them unique queries and crashes performance.

At least in Postgres, you can use ANY and pass an array of values. From what I've read, the generated query plan is equivalent.
> From what I've read, the generated query plan is equivalent.

I believe at one point IN (values) was just sugar for = ANY(ARRAY(values)), don't know if that's still the case.

Still requires that your DB adapter supports array parameters though.

Alternatively, you can use a temp table and a join. Not ideal, but still safe from injection and should have roughly the same overhead in the database.
This isn't the case with oracle, MySQL, or hsqldb, at least when using jdbc. I believe it's also not true for SQLite and postgres.
select * from Person where email = any(?)
This.

I really don't get why anyone in this day and age doesn't use parameterised queries.

It's such a big security win and comes with such little programming overhead that it boggles my mind to think people still use manual string escaping.

In Postgres until 9.2, the query plan for prepared statements is fixed at prepare time rather than time of execution, so you might end up with worse query plans. It's worth noting that 9.1 is still the version in Ubuntu 12.04, it's only 14.04 that has 9.3.
It should be noted that Postgres offers a repository with the latests versions even for Ubuntu 12.04, so you don't need to upgrade the distro or compile PG yourself to use them.
This have been an annoyance in many DBMS-engines, I have previously solved it with recompiles, either triggered or nightly/hourly.
> It's worth noting that 9.1 is still the version in Ubuntu 12.04, it's only 14.04 that has 9.3.

And the most recent LTS, 16.04, has 9.5.

In which case different values would end up have a different query plan? NULL values?
That's one of the main points of gathering statistics, so the query plan can depend on the values being queried. At least in Oracle, not sure about Postgres. For long running queries it can be a big win.
Here is an example [1] that just is a simple equality comparison on an indexed column falling back to a sequential table scan.

[1] http://blog.endpoint.com/2014/04/custom-plans-prepared-state...

Skewed distribution.
People may use frameworks which sometimes default to concatenation, even when the ORM makes it look like they're using parameterized queries.

People care more about convenience and fast iteration than security - concatenation is faster and easier, and insecure sites can still make money, so the incentives are on the side of fast, dirty code that works now rather than secure code that works later.

People (this is especially true in the PHP world) may not even know parameterized queries exist.

I too wish more people in the PHP world would use parameters, but alas, your second paragraph sums up the common mentality. If its still going to make money, why go to the effort? Personally I love sending a query with an object and just let the engine do the work.

I see this sort of thing on a daily basis, but it's not from PHP developers but our "lead" Delphi developer. I've tried to convince him of the benefits of parameter based queries, but believes that it would be a performance hit (performance for him maybe, he'd have to struggle to learn something new).

Not just security, but readability (in most languages) and performance gain also (in popular RDBMSes and drivers)