Hacker News new | ask | show | jobs
by hildaman 2758 days ago
I'll tell you why - try to update one column in 100GB worth of row data.

Postgress makes a copy of _every_ row and you need 100GB of extra space on the hard-drive until you commit the transaction. Now extrapolate to a 1TB table that needs updating.

Oracle has a way of doing this w/o copying the entire row.

3 comments

> Postgress makes a copy of _every_ row and you need 100GB of extra space on the hard-drive until you commit the transaction.

This only happens if the column is indexed, heap-only-tuples will allow in-place updates otherwise. This doesn't dismiss it as a potential problem entirely, but depending on your needs you may never run into this.

I would argue thats an issue with your architecture at that point - you may want to use table partitioning at sizes that big, or have some other mechanism in place to be able to lock access while updating such large data sets in one go.
I would generally agree, though there are unfortunately limitations imposed on you the moment you start using partitioning with PostgreSQL (foreign keys remain to be a big one).
Not as of PG 11
You still can’t make FK references TO partitioned tables, unfortunately.
So what's the workaround? Drop the index, update, then re-establish the index?
> Postgress makes a copy of _every_ row

That's what the zHeap storage engine for Postgres fixes - in-place updates for fixed width data.

https://github.com/EnterpriseDB/zheap

"for eventual integration into PostgreSQL"..
And is being actively developed through the mailing list. Feel free to follow the progress. It's great to watch it happen in real time.
> Oracle has a way of doing this w/o copying the entire row.

And I have seen Oracle 12c servers completely lock because of that way.