Hacker News new | ask | show | jobs
by vlovich123 970 days ago
Is there a talk you can point me to?

There’s several challenges with this approach that come up for me (which unless I’m mistaken is the naive approach of checking each write against a notification set).

The first is that maintaining the read set seems very expensive since it scales with the number of live queries installed. In a multi tenant DB, that seems tricky to manage in terms of memory used.

The second is that the computation of intersection with a read range seems potentially very expensive. Imagine you have a string column. You now have to do a string comparison for every insertion that might hypothetically match the query.

Finally, computing that read set correctly seems challenging (as you mention) and it’s not immediately clear to me it’s always tractable (eg could you do this with arbitrary complicated joins?).

Additionally, in your description, each write has an implicit table scan to do the select to find the intersection. That will tank write throughput even for small total intersection sets (eg there’s a reason LSM databases do deletes by inserting a tombstone instead of checking whether the data exists first, same with the merge operator in RocksDB - a db read in the write path significantly kills performance)

1 comments

See https://news.ycombinator.com/item?id=31836545

To the specifics of my solution, you’re going to visit many less than O(subscriptions) rows using a btree index on min & max as I suggested, and I’m sure there’s use case specific data structures that could do an even better job.

You also don’t need to stall the write transaction until subscriptions are notified; you can batch up that work into big chunks and process it on a background queue after the transaction commits.

Finally, you don’t need to have subscribe to read sets that exactly match the query. You can simplify them a lot for those kinds of complex cases because you can tolerate over-subscription, re-running the query even if it hasn’t changed, checking the new result, and then deciding to deliver it to the external subscriber or not.