Hacker News new | ask | show | jobs
by alexggordon 4067 days ago
This website is great. However, this doesn't actually touch on the real issue.

I work at a MSSQL shop, and all of us know and are convinced that PG is better. Most of us use PG for our side projects and some of the dev's don't even use windows that much, with some custom MSSQL plugins we've built for linux. However, the problem still exists, of how do you port a ton of Databases over to Postgres? We're a multi-tenancy shop, so close to zero downtime is very important, and it would get really complicated if we ran multiple production versions of our app, one with a PG adapter and one with a MSSQL adapter.

A cursory Google search will show that you aren't going to get a ton of help converting them[0][1], not to mention the overhead of switching 20 developers from MSSQL to PG overnight.

This website is however excellent at convincing people to use PG over MSSQL. Perhaps, given the direction that Microsoft is going, they'll open source MSSQL overnight and it will become something competitively similar to PG in the long run.

[0] http://www.convert-in.com/mss2pgs.htm

[1] https://wiki.postgresql.org/wiki/Microsoft_SQL_Server_to_Pos...

2 comments

PostgreSQL doesn't have clustered indexes, materialized views, partitioned views, or a sane backup/restore procedure.

If you're already a licensed MSSQL customer, I'm not sure what advantages PostgreSQL could really have compared to it's slower performance and much higher operational costs.

Postgres absolutely has materialized views (I don't know what makes a backup / restore procedure "sane", but having had to do it for both SQL Server and Postgres I would definitely call Postgres's "saner").
It has a feature named materialized views, but it isn't close to being the real feature that MSSQL calls it.
> It has a feature named materialized views, but it isn't close to being the real feature that MSSQL calls it.

MSSQL doesn't have a featured called materialized views, it has a feature called indexed views which happens to implement a fairly robust version of the general DB pattern called materialized view.

PG has less mature materialized views (which are called materialized views) starting from 9.3, with additional features in 9.4, and which can be expected to mature further over at least the 9.5 and 9.6 cycles, from what I've seen various places. Certainly, SQL server is ahead on materialized views, but its not an "SQL server has them, Postgres doesn't" kind of thing.

SQL server can't do materialized views that refresh on demand, while postgres can't do materialized views that refresh automatically.

Personally, I saw materialized views that refresh on demand used much more than materialized views that refresh automatically, even though Oracle can be both.

Well I can't think of a scenario where I want a single-updated materialized (or indexed) views. Because I can already do that by creating a table from a SELECT statement. So I don't see the point or use of what PG has implemented. I asked in IRC and got the same answer. It's like saying a read-only table supports updates cause you can drop it and rebuild it with the new data.

When would anyone really need PG's version of this feature over creating a table from a query? Out of all the usages of "I need an indexed/materialized view", PG covers essentially none of them, at least that I can think of.

Anything you can do with materialized/indexed views (including most conceivable update models) you can do with the right combination of creating tables with SELECT statements and the right triggers to rebuild and/or update the tables.

What specialized syntax for materialized views and related configuration provides is some combination of clarity of intent, developer convenience, performance improvements over the most naïve update implementations, and additional information for the planner to use to optimize queries.

Postgres current level of support for named-as-such MVs provides basically the first two benefits for certain situations (and a little bit of the third in certain situations via REFRESH MATERIALIZED VIEW CONCURRENTLY.)

re: Materialized Views: I stand corrected! Thanks. (CREATE MATERIALIZED VIEW was introduced in v9.3)

The backup/restore procedure in PostgreSQL (last I looked, latest I've used is 9.2) is just statement generation. It's not a binary backup. So it's bound by INSERT performance.

Which is rage inducing when you have even just tens of millions of rows.

MSSQL'97 could do a backup/restore in a small fraction of the time. And if you just wanted to move data from Server A to Server B? You could link servers and just SELECT INTO.

No clustered index? What about CLUSTER?

http://www.postgresql.org/docs/9.4/static/sql-cluster.html

CLUSTER instructs PostgreSQL to cluster the table specified by table_name based on the index specified by index_name. The index must already have been defined on table_name.

When a table is clustered, it is physically reordered based on the index information. Clustering is a one-time operation: when the table is subsequently updated, the changes are not clustered. That is, no attempt is made to store new or updated rows according to their index order. (If one wishes, one can periodically recluster by issuing the command again. Also, setting the table's FILLFACTOR storage parameter to less than 100% can aid in preserving cluster ordering during updates, since updated rows are kept on the same page if enough space is available there.)

When a table is clustered, PostgreSQL remembers which index it was clustered by. The form CLUSTER table_name reclusters the table using the same index as before. You can also use the CLUSTER or SET WITHOUT CLUSTER forms of ALTER TABLE to set the index to be used for future cluster operations, or to clear any previous setting.

Yes you can manually re-order a table, with locking, but that's really not the same thing as enforcing in on INSERT/UPDATE.
Clustered index is something it would be nice for Postgres to have, but it would be a lot of work to implement and it's certainly not always a performance win. Looking up by that index is very fast, but looking up by a secondary index may have to traverse a second btree (it cannot point directly at data on disk because its location is tied to the clustered index). And of course there's quite a bit of overhead for transactions and modification operations.

In practice, you can recover many of the advantages of clustered indexes in PostgreSQL (along with the disadvantages) with a covering index eligible for https://wiki.postgresql.org/wiki/Index-only_scans.

That's really just for setting up replication IME (and even then, that's pretty complex compared to MSSQL).

What if I fat finger a table drop for Client A? With pgbasebackup the best I could do is set up an entirely new server and jump through some hoops. With MSSQL I could just restore the single client database to a new database, very quickly, and SELECT INTO the missing data from clientdb_a_backup to clientdb_a.

It's just a lot simpler, a lot quicker, and a lot more flexible.

can you be more specific with the "just statement generation" allegation? it looks like you're talking about `pg_dump --format=plain`, which is the default but far from the only option.
I'm using that for short-hand yea. But it was my experience pg_restore is no faster outside of basic parallelization.
Both databases have a bunch advantages over each other. There is no obvious winner.

PostgreSQL has better performance for some loads (as far as I know there is no clear winner in performance), better GIS support, writable CTEs, excellent built-in JSON and array support, an amazing CLI, and personally I think PostgreSQL has nicer built-in types and functions.

One thing that PG does that SQL Server doesn't do by default is it is built with MVCC in mind. If you want MVCC on SQL Server, you need to use snapshot isolation, and for snapshot isolation SQL Server needs to use TempDB.

TempDB, at least last I looked, doesn't really scale too well.

Just an observation.