Hacker News new | ask | show | jobs
by swid 408 days ago
The comment about = is mine and I thought about rewriting that part of the comment after the fact. It is soooo hard to explain these things without writing a book.

The reason = mean it runs once is because the outer query will only run once, and in this case that query, when using =, can only delete based on a single id. But if that outer query was subquery in a context where it could be run more than once, you are back to where you started. Hence me saying their fix was sort of incidental.

1 comments

I think your logic is flawed (even if PostgreSQL may behave this way in practice).

First, let's make sure we're talking about the same two examples.

A:

    DELETE FROM task_queue
    WHERE id = (
      SELECT id FROM task_queue
      WHERE queue_group_id = 15
      LIMIT 1
      FOR UPDATE SKIP LOCKED
    )
B:

    DELETE FROM task_queue
    WHERE id IN (
      SELECT id FROM task_queue
      WHERE queue_group_id = 15
      LIMIT 1
      FOR UPDATE SKIP LOCKED
    )
You seem to be saying that B may exhibit the problem whilst A does not. (ie. there is a different between using `=` vs `IN`). I would like to see some documentation justifying that.

Here's my logic:

- In both cases, the outer query is run once.

- In both cases the outer WHERE clause is evaluated for each row in `task_queue`.

- In both cases, it is up to the optimizer whether the result of the subquery is materialized or not.

- In both cases, if the subquery is not materialized, multiple rows in the outer query may match the condition.

In practice, it may be that the optimizer always materializes uncorrelated subqueries on the RHS of an `=` expression. My contention is whether that is a formal guarantee.

Most of what you say I agree with. But if this outer query is run only once with version A there is a caveat with where it says “where id = X”.

This cannot match more than one X at a time. So that forces the inner query to be run once, as we can only have one id, and running it twice may produce 2.

I am not sure though to be honest.

We are too deep for me to reply now, but to your next comment I didn't mean only one row, but only one id. It is easy for a small difference in word choice to get things wrong.

I think if it did return two rows; ie limit 2- the query with = will fail. Hmmm maybe that will happen even with limit 1 under certain plans. I wouldn’t trust it.

Why would it only be run once? The WHERE condition of the outer query is run multiple times: once for each row, so of course it can return TRUE multiple times.

For example:

    DELETE FROM example WHERE id = (SELECT RANDOM(0, 100) FROM other_table LIMIT 1)
This could delete multiple rows in principle, since there may be multiple rows where the `=` expression is true.
After thinking about it more, yes I agree with your take. Maybe it can even delete different ids using an =.
FWIW, I tested this locally (using `DELETE FROM example WHERE id = RANDOM(1,100)`) and it did delete multiple rows in some cases, even without a subquery.