| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sixbrx 2406 days ago

I might misunderstand, but I think the join just isn't the right approach for that sort of child record counting, ie. counting records from two or more independent child tables associated with your rows (if that is what you're wanting).

You're grouping and counting after the three way join. That join will involve all combinations of child records between the two child tables associated with any given parent row (almost never what is wanted). So any given non-null thing you're counting from one child record will appear multiple times, = the number of child records in the other table associated with the parent row.

I think you just want to use correlated subqueries to count the child records: select a.id, (select count(whatever) from child1 c1 where c1.a_id = a.id), (select count(whatever) from child2 c2 where c2.a_id = a.id) ...

TLDR: You almost never want to join independent children to a common parent, use independent correlated subquery expressions instead.

1 comments

Svip 2406 days ago

Except, that's potentially super slow if the optimiser does not realise what to do. In its default state, it will make two table look ups for each row in table a. So that's 1+N*2 look ups compared to 3 look ups in my example.

For little data, that's probably fine, but for a big database, it will be slow. However, the optimiser may be able to handle that? I know Sybase's and MSSQL's had trouble with it, but I've heard Postgres' might be able to.

link

sixbrx 2406 days ago

First, correctness first. Second, correlated subqueries in selects really aren't noticably slow at all in my experience (Postgres, Oracle) when they're really needed, meaning alternatives are as slow or slower. Especially compared to the alternative of actually doing a cartesian product across independent children of any size. I don't think the optimizer could help much with avoiding that cartesian product to be actually realized, either - the grouping after that join is going to be a big sort in general with results that depend on the distribution of child values on a row-by-row basis. But in any case it would give the wrong answers for both children regardless of speed and memory used!

link