Hacker News new | ask | show | jobs
by ssmoot 4067 days ago
re: Materialized Views: I stand corrected! Thanks. (CREATE MATERIALIZED VIEW was introduced in v9.3)

The backup/restore procedure in PostgreSQL (last I looked, latest I've used is 9.2) is just statement generation. It's not a binary backup. So it's bound by INSERT performance.

Which is rage inducing when you have even just tens of millions of rows.

MSSQL'97 could do a backup/restore in a small fraction of the time. And if you just wanted to move data from Server A to Server B? You could link servers and just SELECT INTO.

3 comments

No clustered index? What about CLUSTER?

http://www.postgresql.org/docs/9.4/static/sql-cluster.html

CLUSTER instructs PostgreSQL to cluster the table specified by table_name based on the index specified by index_name. The index must already have been defined on table_name.

When a table is clustered, it is physically reordered based on the index information. Clustering is a one-time operation: when the table is subsequently updated, the changes are not clustered. That is, no attempt is made to store new or updated rows according to their index order. (If one wishes, one can periodically recluster by issuing the command again. Also, setting the table's FILLFACTOR storage parameter to less than 100% can aid in preserving cluster ordering during updates, since updated rows are kept on the same page if enough space is available there.)

When a table is clustered, PostgreSQL remembers which index it was clustered by. The form CLUSTER table_name reclusters the table using the same index as before. You can also use the CLUSTER or SET WITHOUT CLUSTER forms of ALTER TABLE to set the index to be used for future cluster operations, or to clear any previous setting.

Yes you can manually re-order a table, with locking, but that's really not the same thing as enforcing in on INSERT/UPDATE.
Clustered index is something it would be nice for Postgres to have, but it would be a lot of work to implement and it's certainly not always a performance win. Looking up by that index is very fast, but looking up by a secondary index may have to traverse a second btree (it cannot point directly at data on disk because its location is tied to the clustered index). And of course there's quite a bit of overhead for transactions and modification operations.

In practice, you can recover many of the advantages of clustered indexes in PostgreSQL (along with the disadvantages) with a covering index eligible for https://wiki.postgresql.org/wiki/Index-only_scans.

That's really just for setting up replication IME (and even then, that's pretty complex compared to MSSQL).

What if I fat finger a table drop for Client A? With pgbasebackup the best I could do is set up an entirely new server and jump through some hoops. With MSSQL I could just restore the single client database to a new database, very quickly, and SELECT INTO the missing data from clientdb_a_backup to clientdb_a.

It's just a lot simpler, a lot quicker, and a lot more flexible.

can you be more specific with the "just statement generation" allegation? it looks like you're talking about `pg_dump --format=plain`, which is the default but far from the only option.
I'm using that for short-hand yea. But it was my experience pg_restore is no faster outside of basic parallelization.