Hacker News new | ask | show | jobs
by ssmoot 4067 days ago
The write performance comes down to lack of clustered indexes on PostgreSQL.

So yeah, it'll have better INSERT performance, but the queries will be slower. There's really no way around that. A large data set on disk that's out of order will always be slower than the one that's in-order.

IMO MSSQL makes the right call for the vast majority of use-cases.

PostgreSQL has -no- materialized views (I stand corrected! Introduced in v9.3). No view update support. No partitioned view support. No sane backup/restore process. It's a great database if your primary concern is licensing cost. But if your primary concern is operational cost and even just multi-gigabyte data sets it's really frustratingly rudimentary compared to what MSSQL delivered over a decade ago.

But that's just me. It's free. And I'm thankful for that. I just find it really frustrating that PostgreSQL supports querying on JSON, but doesn't support backing up and restoring the database in binary format.

6 comments

> No view update support

9.3

> No partitioned view support

Inheritance (ish)

> No sane backup/restore process

wat

> doesn't support backing up and restoring the database in binary format

http://www.postgresql.org/docs/9.4/static/app-pgbasebackup.h... is half of what you want, but I expect you're wanting something I don't expect.

I want to be able to say: BACKUP clientdb_a, get a clientdb_a.bak file, and be able to restore it without statement generation.

Which seems like a reasonable baseline expectation if you were to ask a lay-person what they'd imagine a database backup to be. pgbasebackup can't do that without some serious constraints (like, you only have a single database in your PGDATA).

You are correct about clustered indexes (1) if and only if the table clustering is selected well. Which is not always the case - especially with Azure SQL which doesn't allow for non-clustered tables or tables clustered by anything other than primary key.

Table clustering is mostly betting on "I think most access is going to happen using this columns" which might be good choice or it might not, but it's not something that should happen implicitly or by the only choice at all.

I think Markus (2) explains this much better than I do, so I'll just link his text.

(1): which is pretty stupid name, btw - Oracle call it "index organized tables" which is much clearer way to describe the concept.

(2): http://use-the-index-luke.com/sql/clustering/index-organized...

> PostgreSQL has -no- materialized views (I stand corrected! Introduced in v9.3). No view update support.

Like materialized views, automatic update support for simple views was introduced in 9.3.

PG now has materialized views, but they appear mostly useless. Their the equivalent of creating a table from a query. They don't update on every change to the source data. 9.4 I think added a way to force an update, but it just re-reads the entire source. To do it right you still have to use triggers.
Materialized views that refresh on demand are are far from useless. In my experience they are often used on Oracle for both "analytics cache" where you don't need the newest data and the select is very complex and slow, or as parts of ETL processes to decouple data transformation from loading, merging or exporting.

(this makes me realize that postgres can do some pretty cool ETL by itself - and you can express a lot of it in simple SQL terms with foreign data wrappers and materalized views. One on-demand refreshed materialized view can be able to fetch data from many different sources, transform it and provide local access)

MS SQL server, on the other hand, can't do materialized views that update on demand, which makes them hard to use on complex and large datasets, and it makes the database less predictable (inserting one row may be simple operation or it may change 10 tables, you'll never know).

Materialized views in MS SQL have rather bad support for aggregation functions and other computations - you can't compute average in them, for example, as AVG is not supported and doing SUM/COUNT is also not supported because of the division, etc.

I'd say that materialized views in MS SQL are more nice for caching or something like that, but you can't very well use them for analytics or ETL, like you usually see in Oracle, though Oracle can do both on-demand and automatic refresh, of course.

EDIT: the materialized views thing was one of my most amusing experiences with SQL Server. I tried to use materialized view instead of trigger for updating data dependent table in geographic application and I needed to compute average and sums for zoomed out map layers.

The way you add materialized view is by itself confusing - you add a view, then make index on that view, then select from it, but you have to use special keyword so the system uses the materialized view and not just the view.

I tried to use AVG but was told by the server "AVG is not supported, use SUM/COUNT". I tried SUM/COUNT but was told "COUNT is not supported, use COUNT_BIG". I tried SUM/COUNT_BIG and was finally told "using arithmetic in materialized view is not supported".

To this day, I have no idea why one error message suggested doing something that's impossible.

What's the difference between refresh on demand and just creating a table based on a SELECT? It seems like some minor syntactic sugar (which is great, I'm all for that), but not much of a feature.
You can read the materialized view while it is refreshing, you just get the old version of the data. You could do that by doing delete and insert as select in a transaction, but that's not practical for tens or hundreds of thousands of rows.

On Oracle, this is also useful because it works kind of like creating and dropping partition, instead of doing delete then insert, and that's better for the DB because of how Oracle deals with tablespace and blocks (and Oracle's truncate is non-transactional, I think), but that doesn't do much on SQL Server vs. postgres discussion :)

Or just insert into a new table then change the name in a transaction?
>The write performance comes down to lack of clustered indexes on PostgreSQL.

Well, you could just leave everything as a heap in MsSQL, someone might throw things at you, but you could do it.

Or you can pad the index to allow for inserts (or do some partitioning), and schedule some rebuilds to augment the fragmentation.

I use covering indexes for most queries. It works Great for reads, and still write performance is better in postgres (on Linux).

https://wiki.postgresql.org/wiki/Index-only_scans

That's not really anything to do with sorting. At least not covering indexes as I'm familiar with them.