|
|
|
|
|
by lmwnshn
1526 days ago
|
|
Thanks for the detailed response, looking forward to the VLDB paper when it happens! Your vision sounds cool. I wonder if you could get most of the way there by exposing the workload (across different pipeline stages) to a materialized view recommender. In the class project mentioned elsewhere, I found normalizing queries to be pretty slow in practice (naive standardized formatting + query templatization, tried various Python libraries, settled on pglast). I didn't think about trying "skip if fingerprint matches", which may help considerably. Fast normalization is nice! :) |
|
yes! that's something we are trying to do, since we have a way to create signatures for SQL statements and subqueries are just SQL statements then we can get all the "signatures" a query use and compare if other queries are using the same signatures. Then just sort those queries by number of times used and put some other perf metrics like IO/CPU needed to compute it and you get a good starting point.
Microsoft did something similar with Azure, using bipartite graphs, their solution was more advanced as they also baked in constraints like "the materialized view can't be more than X GB in size" but the end result is the same. (https://www.microsoft.com/en-us/research/uploads/prod/2018/0...)