| HN Mirror

> I wonder if you could get most of the way there by exposing the workload (across different pipeline stages) to a materialized view recommender

yes! that's something we are trying to do, since we have a way to create signatures for SQL statements and subqueries are just SQL statements then we can get all the "signatures" a query use and compare if other queries are using the same signatures. Then just sort those queries by number of times used and put some other perf metrics like IO/CPU needed to compute it and you get a good starting point.

Microsoft did something similar with Azure, using bipartite graphs, their solution was more advanced as they also baked in constraints like "the materialized view can't be more than X GB in size" but the end result is the same. (https://www.microsoft.com/en-us/research/uploads/prod/2018/0...)