| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by vvern 2020 days ago

> 1. Log "intents to write" rather than writes themselves in Topic A 2. Have a separate denormalization computed and kept in a separate Topic B, which can be read from. This denormalization needs to be read until the intent propagates from Topic A. 3. Convert those intents into commits. 4. Deal with all the failure cases in a distributed system, e.g. cleaning up abandoned intents, etc.

People do do this. I have done this. I wish I had been more principled with the error paths. It got there _eventually_.

It was a lot of code and complexity to ship a feature which in retrospect could have been nearly trivial with a transactional database. I'd say months rather than days. I won't get those years of my life back.

The products were build on top of Kafka, Cassandra, and Elasticsearch where, over time, there was a desire to maintain some amount of referential integrity. The only reason we bought into this architecture at the time was horizontal scalability (not even multi-region). Kafka, sagas, 2PC at the "application layer" can work, but you're going to spend a heck of a lot on engineering.

It was this experience that drove me to Cockroach and I've been spreading the good word ever since.

> If you use an OLTP database, and generate events into Kafka via CDC, you get the best of both worlds.

This is the next chapter in the gospel of the distributed transaction.

1 comments

gunnarmorling 2020 days ago

>> If you use an OLTP database, and generate events into Kafka via CDC, you get the best of both worlds.

> This is the next chapter in the gospel of the distributed transaction.

Actually, it's the opposite. CDC helps to avoid distributed transaction; apps write to a single database only, and other resources (Kafka, other databases, etc.) based on that are updated asychronously, eventually consistent.

link

vvern 2020 days ago

I mean, I hear you that people do that and it does allow you to avoid needing distributed transactions. When you're stuck with an underlying NoSQL database without transactions, this is the best thing you can do. This is the whole "invert the database" idea that ends up with [1].

I'm arguing that that usage pattern was painful and that if you can have horizontally scalable transaction processing and a stream of events due to committed transactions downstream, you can use that to power correct and easy to reason about materializations as well as asynchronous task processing.

Transact to change state. React to state changes. Prosper.

[1] https://youtu.be/5ZjhNTM8XU8?t=2075

link