Hacker News new | ask | show | jobs
by thingsilearned 2429 days ago
Yeah exactly. Much of the "overkill" was done because of performance and cost reasons, that frankly just don't apply anymore. Now the largest expense by far is time.

There are a number of people starting to talk about Star Schemas having little gain on modern stacks. The performance and costs gains are automatically done now with C-Store warehouses. Fivetran has a great post on this https://fivetran.com/blog/obt-star-schema

1 comments

The reason a Star Schema exists is because it reduces the amount of data stored and it reduces the load times (faster updates) because only the facts and measures get updated. There is also little redundancy in Master Data (dimensions).

in the world of fixed assets (on-premise data warehouses) this was a concern, I'm curious to see how this plays out in these Cloud Providers because to me it seems like they will be more than happy to rent you as much space as you want and everyone is happy until the bill comes due.

Doing materialized, flat tables everywhere is great for reporting performance but the tables will not be updated as quickly, there will be redundancy in storage and it will be difficult to sync time dependent dimensions.

This is what I mean by updates due to C-Store warehouse engines (Redshift, BigQuery, Snowflake, etc). It's not just that the cloud providers are happy to rent you more space - it's also that they're C-Store and because of that redundancy in columns is well compressed automatically.

We try to explain some of that here - https://dataschool.com/data-modeling-101/row-vs-column-orien...

And Fivetran did a great benchmarking of it here - https://fivetran.com/blog/obt-star-schema

The architecture of the C-Store warehouses often removes the benefits of materialized views. This is why for a very long time Redshift didn't even support them - they insisted they weren't needed as they didn't improve performance over regular redshift significantly.