Hacker News new | ask | show | jobs
by findjashua 2435 days ago
Kafka streams can solve this use case fairly well - though setting up & managing infra may be a bit more than what you'd want to deal with for a hobby project
1 comments

+1, or use any other log-based replication mechanism (e.g. Logstash). The point is that instead of having two independent systems that can easily go out of sync (if not using distributed transactions) and become permanently incostistent with each other, you'll now have the database as the primary source of data (commonly referred to as the system of record) and Elasticseach as a secondary, eventually consistent search index. This approach sacrifices read-your-writes consistency though, but for a search index this can be tolerated.
If the database is the primary source of data, how do you get the data from there into the log-based replication method? I assumed the OP meant you'd write to Kafka, and the messages would be processed twice: once to write to the DB, and once to ElasticSearch.

Not wanting to do that for a small project, but wanting a better architecture than I've got, I'm curious about your proposed approach.

Two possibilities: either the app writes both to database and Kafka (ideally using an atomic commit) or CDC is setup in Kafka to read database's transaction log (this is faster)

> you'd write to Kafka, and the messages would be processed twice: once to write to the DB, and once to ElasticSearch

This would be equivalent to using a message queue, which (in contrast to log-ordered replication) does not ensure same consistency guarantees (in this case (1) RYW for database writes and (2) database being always at least as up-to-date as the search index)