Hacker News new | ask | show | jobs
by zasdffaa 1469 days ago
There are too many questions here. What does it not do? What's the overhead of monitoring the main DB and how's it done - triggers? Does it need schema changes? What about race conditions - can you guarantee none? What's the memory overhead you need for the cache? Can you control what gets cached?

> It can serve millions of reads per second on a single node ...

I'm not a network guy but that seems just astonishing - what is a 'node' here?

> ReadySet incrementally maintains result sets of SQL queries based on writes to the primary database.

So basically you've solved the general materialised view incremental update problem? That's an unsolved problem in general, surely?

Edit: not dissing but trying to see where the limits are.

2 comments

> That's an unsolved problem in general, surely?

It's not. Materialize (my employer) incrementally maintains views too, using tech (Differential Dataflow) that has existed for almost 10 years: https://cs.stanford.edu/~matei/courses/2015/6.S897/readings/... .

ReadySet is based on Noria (Jon Gjengset's Ph.D thesis, explained for non-experts here: https://jon.thesquareplanet.com/noria-in-simpler-terms.pdf).

Taking a research project and making it into a production-ready product is hard work -- congrats to the ReadySet team on the launch, and best of luck!

We may be talking very different things. From the postgres docs, a sample materialised view <https://www.postgresqltutorial.com/postgresql-views/postgres...> (I did a few tweaks as marked)

   CREATE MATERIALIZED VIEW rental_by_category
   AS
   SELECT c.name AS category,
     sum(p.amount) AS total_sales
    FROM (((((payment p
      JOIN rental r ON ((p.rental_id = r.rental_id)))
      JOIN inventory i ON ((r.inventory_id = i.inventory_id)))
      JOIN film f ON ((i.film_id <> f.film_id)))  -- tweak
      JOIN film_category fc ON ((f.film_id = fc.film_id)))
      JOIN category c ON ((fc.category_id < c.category_id)))  -- tweak
   GROUP BY c.name
   HAVING sum(p.amount) NOT IN (196, 203, 791)  -- tweak
   ORDER BY sum(p.amount) DESC
It can materialise and efficiently (read: incrementally) maintain the result set of that??
I believe the non-equijoin will cause problems for Materialize today (I don’t work on our optimizer team, so I’m not 100% sure and don’t take this as authoritative). We might turn that into a cross join followed by a filter.

I will answer that for sure later today when I’m back at my desk.

If you changed that back to an equals sign, yes, we could incrementally maintain your query.

Thanks. I'd be very interested.

I originally had added the HAVING clause to nastify it further, because this would cause values to appear and disappear, so to handle that you (probably?) have to materialise the entire result of the GROUP BY before applying a HAVING. Which is doable could cause some overhead.

I guess I could see it work for INSERT-only tables, with a lot of headache, but throw in UPDATE and DELETE and it could become awful.

Also we have to agree on what 'incrementally' means :)

Updates and deletes work fine, because Differential Dataflow stores everything as a (row, timestamp, cardinality) tuple. Materializes uses signed integers as its cardinality type. Thus deletes are modeled as (“the row”, t2, -1), which will cancel out with (“the row”, t1, +1), and thus nothing will be returned for that row when the view is queried at any time >= t2. Eventually compaction will cause these cancelled rows to be annihilated entirely and so there will be no lingering space impact.
I checked the plan. This will involve a cross join, and maintaining the entire result of the group by before the having (the former could possibly be avoided with some future work in Materialize to allow range queries; the latter is probably essential). Without your tweaks, the only required state is the join indexes.
Oracle has had MVs that can refresh on update for decades.
So does Pgres & mssql, but general views that are incrementally updated - that's another matter. I'd be very surprised (and pleased).
This is indeed our goal– we're most of the way there with SQL 92 and plan to continue to expand our query support over time!