|
|
|
|
|
by ComNik
4059 days ago
|
|
Thank you for your detailed thoughts.
You obviously have much more practical experience with this kind of system. Many of the problems you mentioned I am aware of, and also have no workable solution yet (detecting lost messages being the biggest - Merkle-tees sounds like a very interesting approach, maybe even applied at the log-level?). As mentioned in another reply, Kafka does support the kind of "pointer-to-log" setup you mention. Also Kafka is designed for lots of consumers, each with different characteristics. In principle, I should be able to sync something like memcache with the same information I need to sync Elasticsearch. The same holds for a websocket-server that reads from this stream and forwards new events to web-app clients. So I don't see the need for more than one "queue" yet, maybe that will show up in practice. Also your setup would require a lot more coordination to handle updates from multiple postgres instances, if I understood correctly. That being said, I'm still in the experimental phase with all of this, I will publish a writeup once I gain a bit more experience. |
|
For example, if you commit a transaction but you're unable to reach the Kafka queue (because you crash, you're SIGTERMed, or there's heavy load causing a network blip, or any other number of reasons), you'll lose updates. You can't very well write to Kafka before you commit, because it's not visible yet outside the transaction.
The only way is to use a transaction log in the same database, in a way that lets the log be read after the commit is done. Logical streaming would let you do this (Bottled Water [1], as someone else here mentioned, does this with Kafka) in a safe way. It's conceptually identical to storing a transaction log table, but wouldn't require as much custom code, and you'd get incremental updates for free.
[1] http://blog.confluent.io/2015/04/23/bottled-water-real-time-...