|
|
|
|
|
by hinkley
1959 days ago
|
|
The problem we always ran into with deletes is them triggering full table scans because our indexes weren't set up correctly to test foreign key constraints properly. Constant game of whack-a-mole that everyone quickly grew tired of. Also more indexes increases the slope of the line for insert operations as data size grows. Another solution is tombstoning data so you never actually do a DELETE, and partial indexes go a long way to making that scale. It removes the logn cost of all of the dead data on every subsequent insert. |
|
This is a classic case where partitioning shines. Lets say those are logs. You partition it monthly and want to retain 3 months of data.
- M1 - M2 - M3
When M4 arrives you drop partition M1. This is a very fast operation and the space is returned to the OS. You also don't need to vacuum after dropping it. When you arrive at M5 you repeat the process by dropping M2.
> Another solution is tombstoning data so you never actually do a DELETE, and partial indexes go a long way to making that scale. It removes the logn cost of all of the dead data on every subsequent insert.
If you are referring to PostgreSQL then this would actually be worse than outright doing a DELETE. PostgreSQL is copy on write so an UPDATE to a is_deleted column will create a new copy of the record and a new entry in all its indexes. The old one would still need to be vacuumed. You will accumulate bloat faster and vacuums will have more work to do. Additionally, since is_deleted would be part of partial indexes like you said, a deleted record would also incur a copy in all indexes present on the table.
Compare that to just doing the DELETE which would just store the transaction ID of the query that deleted the row in cmax and a subsequent vacuum would be able to mark it as reusable by further inserts.