| (Disclaimer: I'm one of the engineers at Materialize) > for example, max and min aggregates aren't supported in SQL Server because updating the current max or min record requires a query to find the new max or min record This isn't a requirement in Materialize, because Materialize will store values in a reduction tree (which is basically like a min / max heap) so that when we add or remove a record, we can compute a new min / max in O(log (total_number_of_records)) time in the worst case (when a record is the new min / max). Realistically, that log term is bounded to 16 (it's a 16-ary heap and we don't support more than 2^64 records). Computing the min / max this way is substantially better than having to recompute with a linear scan. This [1] provides a lot more details on how we compute reductions in Materialize. > there are obviously limits to what can be efficiently maintained I think we fundamentally disagree here. In our view, we should be able to maintain every view either in linear time wrt the number of updates or sublinear time with respect to the overall dataset, and every case that doesn't do so is a bug. The underlying computational frameworks [2] we're using are designed for that, so this isn't just like a random fantasy. > if Materialize has a list of constraints shorter than SQL Server's then you're sitting on technology worth billions Thank you! I certainly hope so! [1]: https://materialize.com/robust-reductions-in-materialize/
[2]: https://github.com/timelydataflow/differential-dataflow/blob... |
This is awesome and I believe that should be technically possible for any query given the right data structure. The reduction tree works for min/max but is it a general solution or are there other data structures for other purposes - n per x and top subqueries come to mind. Is it all handled already or are there some limitations and a roadmap?