Hacker News new | ask | show | jobs
by BeefWellington 1261 days ago
Were you working off proper data warehouses, or just the transactional db?

I ask because something a lot of people miss here is how much performance you can get from the T part of ETL. Denormalizing everything into big simple inflated tables makes things orders of magnitude faster. It matters quite a bit what your comparison is against.

1 comments

We saw major improvements when we simply wrote full tables from a transactional database to parquet, but also, as you say, modelling the data appropriately produced significant improvements, too.
A column oriented database is probably the bigger performance increase. Parquet and a good data warehouse (something like Clickhouse, Druid or Snowflake) will both use metadata and efficient scans to power through aggregation queries.