Hacker News new | ask | show | jobs
by zzzeek 4797 days ago
"other_id" and "other" here refer to two different ways of referring to a many-to-one reference between "somemodel" and "othermodel". Asking for rows of "somemodel" where "object_id" is None is the exact same thing as asking for rows from "somemodel" where its reference to "object" is None. Both should produce the same query, the simple one without the JOIN. The query with the JOIN is completely wasteful and not at all correct - it does a LEFT OUTER JOIN to the remote table, only to filter on those rows where the remote table has no match; but this is already obvious from whether or not "somemodel.other_id" is NULL.
2 comments

You're right in that they are largely the same action, but consider this case:

An Owner is deleted without setting the pets.owner_id to NULL. So now there's a row in Pets with an owner_id referring to an Owner that doesn't exist.

These two queries will (rightfully) return different things in this case.

One will return The Pets who have null for an owner_id. The other will return the Pets whose owner_id is NULL AND those who do have an owner_id, but it doesn't reference an existing row in Owner.

OK, you're right in that owner_id can refer to a nonexistent entry, if the schema both doesn't make correct use of constraints and is also referentially corrupt. Designing the ORM to jump through huge hoops to suit the case that the user is using the relational database incorrectly, in such a way that enormous performance overhead is added to the use case as used correctly in the vast majority of cases, is IMO a poor design decision.
Great point, I didn't even think of that! The join there is for a reason, and that you're getting the entire 'model' not a field. If someone thinks what ORM does in this case is stupid, they probably think table constraints are stupid.
So the ORM is too stupid to know that in the case the other_id is NULL, it doesn't need to create a join subquery. But would you rather the PostgreSQL query planner figure this out or do it yourself in python? I would like to see the time difference both queries take, and I don't think I would want Django ORM to do this.
I think you should run EXPLAIN ANALYZE on both queries and see what you find.
I now understand that it won't be optimized away because it's doing something, and I think I still want it to do that something. I've run into plenty of ORM limitations and use custom SQL for procedures and custom joints, but I still wouldn't call Django ORM stupid for doing what it's doing, in this case.