Hacker News new | ask | show | jobs
by fidotron 19 hours ago
> I'm not aware of instances where a delete is "far more work" than an equivalent insert though. That's not the general case, and I'm having a hard time thinking of any situations where that would be true.

Transactionally across related items with constraints it can explode fast.

If you've ever used FoundationDB this rapidly becomes the defining PITA due to the transaction size limits. Adding/inserting/updates are all far more predictably bounded.

1 comments

But in that case, you need to compare like-for-like with the situation where you need to insert all the prerequisite rows too. You can't just compare a delete cascade with a single insert where all the foreign keys are already satisfied.
The whole problem with the delete cascade is you can't tell how big it will be until you have entered the transaction to do it. An insert you either know or it will fail and you can retry.
That's true, but now you have moved the goalposts. The original claim upthread was "it takes just as much work to delete a row as it takes to insert a row", not "it's hard to predict the performance of a delete with cascade effects". And the obvious rebuttal to that is that it's equally hard to provide an upper bound for the runtime of a single insert: an application cannot control the other processes running on the database, some of which may delay, interfere with or even invalidate your query and you must account for that. A delete operation is just as much "it might fail and you can retry" as an insert, or the database you're working with isn't ACID-compliant.
> And the obvious rebuttal to that is that it's equally hard to provide an upper bound for the runtime of a single insert

This is precisely where you're going wrong. The insert is upper boundable in advance (you know the set of everything you might potentially have to insert), the delete isn't because you don't know what's in the db until you look.

I strongly recommend poking around with Foundation for this, because it becomes clear that this problem is the defining flaw with the way they tried to architect with layers, to the point they have a queuing system for processing large jobs of this type.

> The insert is upper boundable in advance

A concurrent DML happening then suddenly your MERGE INTO WHEN NOT MATCHED INSERT/INSERT INTO SELECT is way larger that you thought? I thought "some workloads can suddenly be way larger that I expected" was supposed to be a thing in all non-trivial DML.