Hacker News new | ask | show | jobs
by andreypopp 2073 days ago
Such workflows with scheduled DELETE/INSERT often mean that the data is "derived", there's unlogged tables feature for that in PostgreSQL. Table configured with unlogged are not being written to WAL and thus generate much much less I/O. The downside is that after a crash occurs the table might be empty (before it is repopulated by the DELETE/INSERT workflow again).
3 comments

It should be an atomic replace - create a new table, do your inserts, swap the new table in. This way you aren’t left without the data during the rebuild, and you are guaranteed a consistent state since DDL is transactional in PostgreSQL.

Make the new table unlogged while it’s being populated as well for that performance improvement if you want, since the atomic replace guarantees your consistency anyway.

Alternative if it’s not a full rebuild, move from a delete/insert to using INSERT .. ON CONFLICT DO UPDATE where possible. I do this for cache tables that store materialized summary views that are updated by pgAgent jobs.

If you are deleting “old” data out then maybe use partitioned tables so you can just DROP the table containing the data you want gone.

Yeah, transactional DDL is a good thing, the potential problem with this solution though — if you have views pointing to the old table they won’t be updated with the new table and you will have to deal with them as well.
UNLOGGED tables is new to me, thanks for sharing.
It's tremendously useful for ETL.
That's super interesting. Can you share some more information on why it's so useful for ETL?
Think of it as a temporary table that doesn't automatically disappear at the end of the session. If nothing but your ETL process cares about the data until it's done being processed, and you still have the original data until the processing is done, in the worst case scenario (the server crashing) you just have to start your ETL process from the start.
Why not use partitioned tables? Simply drop/archive the child tables that you don't need, and avoid the vacuum overhead.
That’s a different thing. In some cases the whole table needs to be replaced. Then inserts into UNLOGGED tables are still much faster than into regular tables (even if there wasn’t a DELETE before).