Hacker News new | ask | show | jobs
by meritt 2351 days ago
Likely too boring and old school data warehousing, but this is what the notion of rollup/aggregate tables achieve. You define a table that's the result of some sort of an aggregation query (time is almost always involved: minutely, hourly, daily, monthly plus other business-specific dimensions) and at the conclusion of each corresponding interval you insert the next respective batch.

    e.g. insert into rollup_clicks_hourly select hour, count(*) clicks from clicks where created_at >= ? and created_at < ? group by 1
Then from a reporting perspective, users/dashboards/etc read from the highest relevant table that meets their criteria.
2 comments

I worked with a company who were doing (basic)analytics on their live ERP db, sales and margin reporting etc. There was a view written to join in all of the common tables so the in house people could query it easily, things like product names, customer names and addresses etc. When performance fell off they materialised the view as a stop gap... until they got around to a data warehouse. 10 years later it is still there, building each night! It was actually the perfect solution for them, simple analytics with no more infrastructure!
That's a very clever concept.

Could you recommend any classic old-school books about data warehousing that you'vre read that teach more such techniques?

Thank you!

Yup, exactly. Kimball's stuff is the best. You can achieve on a modest machine what modern techniques would require an incredibly expensive horizontally scaled MPP database. It does require a lot more planning and forethought to be certain.

https://www.amazon.com/dp/product/1118530802 and possibly https://www.amazon.com/dp/0764567578/