Hacker News new | ask | show | jobs
by MichaelGG 4067 days ago
PG now has materialized views, but they appear mostly useless. Their the equivalent of creating a table from a query. They don't update on every change to the source data. 9.4 I think added a way to force an update, but it just re-reads the entire source. To do it right you still have to use triggers.
1 comments

Materialized views that refresh on demand are are far from useless. In my experience they are often used on Oracle for both "analytics cache" where you don't need the newest data and the select is very complex and slow, or as parts of ETL processes to decouple data transformation from loading, merging or exporting.

(this makes me realize that postgres can do some pretty cool ETL by itself - and you can express a lot of it in simple SQL terms with foreign data wrappers and materalized views. One on-demand refreshed materialized view can be able to fetch data from many different sources, transform it and provide local access)

MS SQL server, on the other hand, can't do materialized views that update on demand, which makes them hard to use on complex and large datasets, and it makes the database less predictable (inserting one row may be simple operation or it may change 10 tables, you'll never know).

Materialized views in MS SQL have rather bad support for aggregation functions and other computations - you can't compute average in them, for example, as AVG is not supported and doing SUM/COUNT is also not supported because of the division, etc.

I'd say that materialized views in MS SQL are more nice for caching or something like that, but you can't very well use them for analytics or ETL, like you usually see in Oracle, though Oracle can do both on-demand and automatic refresh, of course.

EDIT: the materialized views thing was one of my most amusing experiences with SQL Server. I tried to use materialized view instead of trigger for updating data dependent table in geographic application and I needed to compute average and sums for zoomed out map layers.

The way you add materialized view is by itself confusing - you add a view, then make index on that view, then select from it, but you have to use special keyword so the system uses the materialized view and not just the view.

I tried to use AVG but was told by the server "AVG is not supported, use SUM/COUNT". I tried SUM/COUNT but was told "COUNT is not supported, use COUNT_BIG". I tried SUM/COUNT_BIG and was finally told "using arithmetic in materialized view is not supported".

To this day, I have no idea why one error message suggested doing something that's impossible.

What's the difference between refresh on demand and just creating a table based on a SELECT? It seems like some minor syntactic sugar (which is great, I'm all for that), but not much of a feature.
You can read the materialized view while it is refreshing, you just get the old version of the data. You could do that by doing delete and insert as select in a transaction, but that's not practical for tens or hundreds of thousands of rows.

On Oracle, this is also useful because it works kind of like creating and dropping partition, instead of doing delete then insert, and that's better for the DB because of how Oracle deals with tablespace and blocks (and Oracle's truncate is non-transactional, I think), but that doesn't do much on SQL Server vs. postgres discussion :)

Or just insert into a new table then change the name in a transaction?