Hacker News new | ask | show | jobs
by zasdffaa 1469 days ago
We may be talking very different things. From the postgres docs, a sample materialised view <https://www.postgresqltutorial.com/postgresql-views/postgres...> (I did a few tweaks as marked)

   CREATE MATERIALIZED VIEW rental_by_category
   AS
   SELECT c.name AS category,
     sum(p.amount) AS total_sales
    FROM (((((payment p
      JOIN rental r ON ((p.rental_id = r.rental_id)))
      JOIN inventory i ON ((r.inventory_id = i.inventory_id)))
      JOIN film f ON ((i.film_id <> f.film_id)))  -- tweak
      JOIN film_category fc ON ((f.film_id = fc.film_id)))
      JOIN category c ON ((fc.category_id < c.category_id)))  -- tweak
   GROUP BY c.name
   HAVING sum(p.amount) NOT IN (196, 203, 791)  -- tweak
   ORDER BY sum(p.amount) DESC
It can materialise and efficiently (read: incrementally) maintain the result set of that??
1 comments

I believe the non-equijoin will cause problems for Materialize today (I don’t work on our optimizer team, so I’m not 100% sure and don’t take this as authoritative). We might turn that into a cross join followed by a filter.

I will answer that for sure later today when I’m back at my desk.

If you changed that back to an equals sign, yes, we could incrementally maintain your query.

Thanks. I'd be very interested.

I originally had added the HAVING clause to nastify it further, because this would cause values to appear and disappear, so to handle that you (probably?) have to materialise the entire result of the GROUP BY before applying a HAVING. Which is doable could cause some overhead.

I guess I could see it work for INSERT-only tables, with a lot of headache, but throw in UPDATE and DELETE and it could become awful.

Also we have to agree on what 'incrementally' means :)

Updates and deletes work fine, because Differential Dataflow stores everything as a (row, timestamp, cardinality) tuple. Materializes uses signed integers as its cardinality type. Thus deletes are modeled as (“the row”, t2, -1), which will cancel out with (“the row”, t1, +1), and thus nothing will be returned for that row when the view is queried at any time >= t2. Eventually compaction will cause these cancelled rows to be annihilated entirely and so there will be no lingering space impact.
I checked the plan. This will involve a cross join, and maintaining the entire result of the group by before the having (the former could possibly be avoided with some future work in Materialize to allow range queries; the latter is probably essential). Without your tweaks, the only required state is the join indexes.