| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by zasdffaa 1469 days ago

There are too many questions here. What does it not do? What's the overhead of monitoring the main DB and how's it done - triggers? Does it need schema changes? What about race conditions - can you guarantee none? What's the memory overhead you need for the cache? Can you control what gets cached?

> It can serve millions of reads per second on a single node ...

I'm not a network guy but that seems just astonishing - what is a 'node' here?

> ReadySet incrementally maintains result sets of SQL queries based on writes to the primary database.

So basically you've solved the general materialised view incremental update problem? That's an unsolved problem in general, surely?

Edit: not dissing but trying to see where the limits are.

2 comments

umanwizard 1469 days ago

> That's an unsolved problem in general, surely?

It's not. Materialize (my employer) incrementally maintains views too, using tech (Differential Dataflow) that has existed for almost 10 years: https://cs.stanford.edu/~matei/courses/2015/6.S897/readings/... .

ReadySet is based on Noria (Jon Gjengset's Ph.D thesis, explained for non-experts here: https://jon.thesquareplanet.com/noria-in-simpler-terms.pdf).

Taking a research project and making it into a production-ready product is hard work -- congrats to the ReadySet team on the launch, and best of luck!

link

zasdffaa 1469 days ago

We may be talking very different things. From the postgres docs, a sample materialised view <https://www.postgresqltutorial.com/postgresql-views/postgres...> (I did a few tweaks as marked)

   CREATE MATERIALIZED VIEW rental_by_category
   AS
   SELECT c.name AS category,
     sum(p.amount) AS total_sales
    FROM (((((payment p
      JOIN rental r ON ((p.rental_id = r.rental_id)))
      JOIN inventory i ON ((r.inventory_id = i.inventory_id)))
      JOIN film f ON ((i.film_id <> f.film_id)))  -- tweak
      JOIN film_category fc ON ((f.film_id = fc.film_id)))
      JOIN category c ON ((fc.category_id < c.category_id)))  -- tweak
   GROUP BY c.name
   HAVING sum(p.amount) NOT IN (196, 203, 791)  -- tweak
   ORDER BY sum(p.amount) DESC

It can materialise and efficiently (read: incrementally) maintain the result set of that??

link

umanwizard 1469 days ago

I believe the non-equijoin will cause problems for Materialize today (I don’t work on our optimizer team, so I’m not 100% sure and don’t take this as authoritative). We might turn that into a cross join followed by a filter.

I will answer that for sure later today when I’m back at my desk.

If you changed that back to an equals sign, yes, we could incrementally maintain your query.

link

zasdffaa 1469 days ago

Thanks. I'd be very interested.

I originally had added the HAVING clause to nastify it further, because this would cause values to appear and disappear, so to handle that you (probably?) have to materialise the entire result of the GROUP BY before applying a HAVING. Which is doable could cause some overhead.

I guess I could see it work for INSERT-only tables, with a lot of headache, but throw in UPDATE and DELETE and it could become awful.

Also we have to agree on what 'incrementally' means :)

link

umanwizard 1469 days ago

Updates and deletes work fine, because Differential Dataflow stores everything as a (row, timestamp, cardinality) tuple. Materializes uses signed integers as its cardinality type. Thus deletes are modeled as (“the row”, t2, -1), which will cancel out with (“the row”, t1, +1), and thus nothing will be returned for that row when the view is queried at any time >= t2. Eventually compaction will cause these cancelled rows to be annihilated entirely and so there will be no lingering space impact.

link

umanwizard 1468 days ago

I checked the plan. This will involve a cross join, and maintaining the entire result of the group by before the having (the former could possibly be avoided with some future work in Materialize to allow range queries; the latter is probably essential). Without your tweaks, the only required state is the join indexes.

link

trollied 1469 days ago

Oracle has had MVs that can refresh on update for decades.

link

zasdffaa 1469 days ago

So does Pgres & mssql, but general views that are incrementally updated - that's another matter. I'd be very surprised (and pleased).

link

marzoeva 1469 days ago

This is indeed our goal– we're most of the way there with SQL 92 and plan to continue to expand our query support over time!

link