Lots of application developers don’t really understand how sql works. So instead of learning to do a join (or do a sub query or anything like that) they just issue multiple queries from their application backend.
The posters up thread are talking about novice Python / JavaScript / whatever code which issues queries in sequence: first a query to get the IDs. Then once that query comes back to the application, the IDs are passed back to the database for a single query of a different table.
The query optimizer can’t help here because it doesn’t know what the IDs are used for. It just sees a standalone query for a bunch of IDs. Then later, an unrelated query which happens to use those same IDs to query a different table.
SELECT * FROM table2
WHERE table1_id in (SELECT ids FROM table1 WHERE ...)
will be optimized to a join by the query planner (which may or may not be true, depending on how confident you are in the stability of the implementation details of your RDBMS's query planner). But in most circumstances, there is no subquery, it's more like:
SELECT * FROM table2
WHERE table1_id in (452345, 529872, 120395, ...)
Where the list of IDs are fetched from an API call to some microservice, or from a poorly used/designed ORM library.
What’s really bad is where they (or the ORM) generate a seperate select statement for each ID in a loop. Selecting thousands of rows one row at a time from an unindexed table can be hilariously slow.
And this doesn’t even get into bizarre edge cases, like how Postgres treats foo = ANY(ARRAY[‘bar’, ‘baz’]) differently than foo IN (VALUES((‘bar’), (‘baz’))). PROTIP: the latter is often MUCH faster. I’ve learned this twice this week.