Hacker News new | ask | show | jobs
by matsemann 1966 days ago
> Clear bloat in tables

Ohh, we've had issues with this. We have this table that's mostly ephemeral data, so rows are constantly inserted and then deleted after a certain amount of time. Due to a bug the deletion didn't work for a while and the db grew very large. Fixed the deletion, but no amount of vacuuming actually allows us to fully reclaim that space so we don't have to pay for it.

At the same time the extra cost is probably negligible compared to spending more energy fixing it..

2 comments

The problem we always ran into with deletes is them triggering full table scans because our indexes weren't set up correctly to test foreign key constraints properly. Constant game of whack-a-mole that everyone quickly grew tired of. Also more indexes increases the slope of the line for insert operations as data size grows.

Another solution is tombstoning data so you never actually do a DELETE, and partial indexes go a long way to making that scale. It removes the logn cost of all of the dead data on every subsequent insert.

> The problem we always ran into with deletes is them triggering full table scans because our indexes weren't set up correctly to test foreign key constraints properly.

This is a classic case where partitioning shines. Lets say those are logs. You partition it monthly and want to retain 3 months of data.

- M1 - M2 - M3

When M4 arrives you drop partition M1. This is a very fast operation and the space is returned to the OS. You also don't need to vacuum after dropping it. When you arrive at M5 you repeat the process by dropping M2.

> Another solution is tombstoning data so you never actually do a DELETE, and partial indexes go a long way to making that scale. It removes the logn cost of all of the dead data on every subsequent insert.

If you are referring to PostgreSQL then this would actually be worse than outright doing a DELETE. PostgreSQL is copy on write so an UPDATE to a is_deleted column will create a new copy of the record and a new entry in all its indexes. The old one would still need to be vacuumed. You will accumulate bloat faster and vacuums will have more work to do. Additionally, since is_deleted would be part of partial indexes like you said, a deleted record would also incur a copy in all indexes present on the table.

Compare that to just doing the DELETE which would just store the transaction ID of the query that deleted the row in cmax and a subsequent vacuum would be able to mark it as reusable by further inserts.

have a look at pg_repack. That'll solve it for you.

> but no amount of vacuuming actually allows us to fully reclaim that space

a full vacuum would. but it would also lock the table for the duration (which is something pg_repack won't do)

Yup, locking the table is off the table so to speak, heh. Will take a look, thanks.