Hacker News new | ask | show | jobs
by blakeburch 1465 days ago
If I'm understanding your question correctly, it sounds like you're just generating cache tables for each variant of the root table with a different set of group bys (although I'm curious why you wouldn't want live views instead). I would recommend a python script that accepts a list of group by parameters. When executed, it loops through the list of parameters, generates and executes the DML with those parameters, then runs a generated query to store the results in the new table.

Feel free to message me separately if you want to dive into specifics.

1 comments

Exactly. The reason I dont want to create a view is the view will incurr a lot of time being spent to just querying the same thing, making the user wait for it to finish (all of our tables are in Hive). If I use a materialized view or another table, the user just need to do 'select * from xxx' and get the report immediately.

I've experimented with OLAP engine such as Clickhouse and its view, so far looks good but need a lot of investment upfront to maintain it

I guess the way you mentioned is the only way we can do and we'll try to optimize from that

Cheers!