Hacker News new | ask | show | jobs
by tremon 5 days ago
But in that case, you need to compare like-for-like with the situation where you need to insert all the prerequisite rows too. You can't just compare a delete cascade with a single insert where all the foreign keys are already satisfied.
1 comments

The whole problem with the delete cascade is you can't tell how big it will be until you have entered the transaction to do it. An insert you either know or it will fail and you can retry.
That's true, but now you have moved the goalposts. The original claim upthread was "it takes just as much work to delete a row as it takes to insert a row", not "it's hard to predict the performance of a delete with cascade effects". And the obvious rebuttal to that is that it's equally hard to provide an upper bound for the runtime of a single insert: an application cannot control the other processes running on the database, some of which may delay, interfere with or even invalidate your query and you must account for that. A delete operation is just as much "it might fail and you can retry" as an insert, or the database you're working with isn't ACID-compliant.
> And the obvious rebuttal to that is that it's equally hard to provide an upper bound for the runtime of a single insert

This is precisely where you're going wrong. The insert is upper boundable in advance (you know the set of everything you might potentially have to insert), the delete isn't because you don't know what's in the db until you look.

I strongly recommend poking around with Foundation for this, because it becomes clear that this problem is the defining flaw with the way they tried to architect with layers, to the point they have a queuing system for processing large jobs of this type.

> The insert is upper boundable in advance

A concurrent DML happening then suddenly your MERGE INTO WHEN NOT MATCHED INSERT/INSERT INTO SELECT is way larger that you thought? I thought "some workloads can suddenly be way larger that I expected" was supposed to be a thing in all non-trivial DML.

You don't even need a complex query; even the simplest of insert statements can cause cascade side effects if you have temporal tables or materialized views (or, Codd forbid, ON INSERT triggers).
I will die on the hill that triggers are a perfectly fine tool, when used reasonably. ON INSERT isn’t usually the one I point at causing problems; that’d be ON DELETE CASCADE. 1:M relationships with large values of M are already iffy for deletions or updates; couple that with unnecessarily wide columns (or just storing large text / json / blob), and worst-case, non-clustering indices, and “delete this user” turns into “fetch thousands of pages.”