Hacker News new | ask | show | jobs
by ray_v 636 days ago
You have to be careful here that a one-to-many relationship doesn't exist and returns more than 1 row -- it'll cause an error and halt your query
1 comments

Yes, as I noted.

Frequently this is trivial, sometimes it's not.

If there will be multiple hits but it doesn't matter that much, there's the obvious TOP 1 or MIN(col) and such.

It's a tradeoff between accidentally breaking the query and returning unexpected data.

Note that if you used join you could have bigger issues as the join would succeed but now you got multiple rows where you didn't expect.

Are there any tools or tips to help speed up the "which JOIN is duplicating data" hunt?

Usually my biggest problem is getting all the query parameters lined up to reproduce the issue! (Being able to flip on extended logging or a profiler can make this easy.)

Cutting out the result columns when disabling JOINs to narrow it down is straightforward but tracking columns down in WHERE clauses quickly tends not to be.

Good question. Obviously a profiler or similar that can capture the details when it happens helps, as you note.

If you can reproduce the issue then what I tend to do is to include the unique id column from each joined table (we try to avoid natural keys).

If it doesn't have a unique id column I replace the join with a subquery that includes row_number(), so I can se which one that doesn't repeat.

But without being able to replicate, I don't know of any better way than just studying the ON conditions carefully.