Hacker News new | ask | show | jobs
by supercanuck 2429 days ago
The reason a Star Schema exists is because it reduces the amount of data stored and it reduces the load times (faster updates) because only the facts and measures get updated. There is also little redundancy in Master Data (dimensions).

in the world of fixed assets (on-premise data warehouses) this was a concern, I'm curious to see how this plays out in these Cloud Providers because to me it seems like they will be more than happy to rent you as much space as you want and everyone is happy until the bill comes due.

Doing materialized, flat tables everywhere is great for reporting performance but the tables will not be updated as quickly, there will be redundancy in storage and it will be difficult to sync time dependent dimensions.

1 comments

This is what I mean by updates due to C-Store warehouse engines (Redshift, BigQuery, Snowflake, etc). It's not just that the cloud providers are happy to rent you more space - it's also that they're C-Store and because of that redundancy in columns is well compressed automatically.

We try to explain some of that here - https://dataschool.com/data-modeling-101/row-vs-column-orien...

And Fivetran did a great benchmarking of it here - https://fivetran.com/blog/obt-star-schema

The architecture of the C-Store warehouses often removes the benefits of materialized views. This is why for a very long time Redshift didn't even support them - they insisted they weren't needed as they didn't improve performance over regular redshift significantly.