CREATE MATERIALIZED VIEW rental_by_category
AS
SELECT c.name AS category,
sum(p.amount) AS total_sales
FROM (((((payment p
JOIN rental r ON ((p.rental_id = r.rental_id)))
JOIN inventory i ON ((r.inventory_id = i.inventory_id)))
JOIN film f ON ((i.film_id <> f.film_id))) -- tweak
JOIN film_category fc ON ((f.film_id = fc.film_id)))
JOIN category c ON ((fc.category_id < c.category_id))) -- tweak
GROUP BY c.name
HAVING sum(p.amount) NOT IN (196, 203, 791) -- tweak
ORDER BY sum(p.amount) DESC
It can materialise and efficiently (read: incrementally) maintain the result set of that??
I believe the non-equijoin will cause problems for Materialize today (I don’t work on our optimizer team, so I’m not 100% sure and don’t take this as authoritative). We might turn that into a cross join followed by a filter.
I will answer that for sure later today when I’m back at my desk.
If you changed that back to an equals sign, yes, we could incrementally maintain your query.
I originally had added the HAVING clause to nastify it further, because this would cause values to appear and disappear, so to handle that you (probably?) have to materialise the entire result of the GROUP BY before applying a HAVING. Which is doable could cause some overhead.
I guess I could see it work for INSERT-only tables, with a lot of headache, but throw in UPDATE and DELETE and it could become awful.
Also we have to agree on what 'incrementally' means :)
Updates and deletes work fine, because Differential Dataflow stores everything as a (row, timestamp, cardinality) tuple. Materializes uses signed integers as its cardinality type. Thus deletes are modeled as (“the row”, t2, -1), which will cancel out with (“the row”, t1, +1), and thus nothing will be returned for that row when the view is queried at any time >= t2. Eventually compaction will cause these cancelled rows to be annihilated entirely and so there will be no lingering space impact.
I checked the plan. This will involve a cross join, and maintaining the entire result of the group by before the having (the former could possibly be avoided with some future work in Materialize to allow range queries; the latter is probably essential). Without your tweaks, the only required state is the join indexes.