Hacker News new | ask | show | jobs
by cntlzw 2251 days ago
"EventReduce can be used with relational databases but not on relational queries that run over multiple tables/collections."

Forgive my ignorance, but that is the whole point of working with a relational database. If cannot use JOINS then this solves only a very limited use case.

2 comments

The biggest usecase for EventReduce is realtime applications. Most technologies for these like Firebase, AWS AppSync etc. work on non-relational data. If you want to use EventReduce with relational queries, you have to make them non-relational before, for example by using materialized views. If you do not want to do that, you should not use this algorithm in its current featureset.
My guess would be that if you're at a scale where you're thinking about these sorts of things, you are also at a scale where you're running on multiple machines. How does EventReduce share writes across the cluster?
EventReduce is an algorithm and not a database-wrapper. It will not care about your writes or if your database layer is a cluster and so also not affect them.
Sorry I wasn't clear in my original post.

I'm thinking about the application layer. If you have an application that writes data to a table, it's typical to run multiple instances of that application to support scale and reliability requirements.

If I send a write to one instance, how does it communicate and synchronise that write with the other application instances?

I ask because this can be a tricky thing to do, especially when consensus is required, as consensus algorithms such as Raft/Paxos require a number of network roundtrips which will introduce latency, and actually account for much of that latency in the database examples given in some cases.

EventReduce is a simple algorithm. It does not care or affect how you handle propagation of writes or how you handle your events, transactions or conflicts.

See it as a simple function that can do oldResults+event=newResults like shown in the big image on top of the readme.

This means then that if you run multiple application servers, which most do, that you’ll need to implement a data distribution mechanism of some sort.

I must admit, with limitations like this I’m struggling to figure out the use cases for this.

Edit: so I guess this is easier using the change subscriptions you mention in other comments. That does mean many subscribers, but hopefully that’s minimal load. This has the trade-off that it’s now eventually consistent, but I suppose that’s not a problem for many high read applications.

I’m still feeling like this could be solved in a simpler way with just simple data structures and a pub sub mechanism. Now I think of it, we do a similar thing with Redis for one service, and a custom Python server/pipeline in another, but we’ve never felt the need for this sort of thing.

Do you have more details about specific applications/use cases, and why this is better than alternatives?

thank you for the clarification
Noria [1] is a research database that solves the same problem while still supporting all relational database queries.

[1] https://github.com/mit-pdos/noria

I think it is dangerous to propose a database product as a solution to the limitation of a simple algorithm.