Hacker News new | ask | show | jobs
by hashimotonomora 1581 days ago
Couldn’t that bring inconsistencies eventually? If one commits and the other doesn’t.
3 comments

This is where you might want to use the outbox pattern. If you depend on dual writes you will definitly have consistency issues at some point
The question at hand will always reduce down to the "Two Generals problem" [0]. The outbox pattern is a nice separation of concerns and gives at least once consistency as long as the forwarding happen before writing the event as processed in the outbox. Both side effects happening atomically through all failure cases is impossible.

To solve that issue you need a cooperating destination for your writes which handles de-duplication. Then you get into the weeds of causality, ordering and idempotence. Ugh.

For some real world examples see the AWS Kinesis documentation which says that any application using Kinesis must be able to handle duplicate records [1].

> There are two primary reasons why records may be delivered more than one time to your Amazon Kinesis Data Streams application: producer retries and consumer retries. Your application must anticipate and appropriately handle processing individual records multiple times.

[0]: https://en.wikipedia.org/wiki/Two_Generals%27_Problem

[1]: https://docs.aws.amazon.com/streams/latest/dev/kinesis-recor...

For anyone who's interested in the outbox pattern, I also blogged about that: https://blog.frankdejonge.nl/reliable-event-dispatching-usin...
It is possibly that you could get inconsistencies between systems, but you mitigate that with durable event streams (e.g. Kafka) and by ensuring that each data entity in your enterprise has a system of record that can be used to resolve conflicts.

This is really not that different than using ETL jobs to move data about, but taking a streaming approach vs a batch approach.

> It is possibly that you could get inconsistencies between systems, but you mitigate that with durable event streams (e.g. Kafka) and by ensuring that each data entity in your enterprise has a system of record that can be used to resolve conflicts.

The event stream layer isn't where the sync problems have arised in the systems I've worked on. It's the "commit transactionally both to your database and the event stream" part. Not a lot of systems are built to be ready to roll back the database change if the event publish fails. Or to be able to handle duplicate events if you error the other way.

So always write to a transactional queue first and then write to the database by reading sequentially from the queue?
That would work. The internal consistency will be at least once and you do de-duplication to handle message reliability through the database then. In other words, you need to uniquely identify each message to ensure idempotency since you will have duplicate writes of the same message to the database.

Just need to make sure to not mess up causal ordering between events because of out of order retries, if such things are important for your application.

Reading sequentially can be hard here, especially depending on throughput and how well you can or can't shard (e.g. how wide is the radius of possible side effects).

E.g. what if one event is something like "credit here, debit there" - you need to process it sequentially for both sides!

I guess if the concern is around atomically committing to the DB and event stream at the same time, you could hang the event stream off the DB using change-data-capture to populate the events.

Ultimately, whenever you are pushing data between systems you can end up with inconsistencies which why it’s important to clearly define systems of record.

That's basically what the database would do anyway, so generally, yes (e.g. even with a RDBMS it's internally got some sort of write log, in most cases). That's gonna help a lot!

It doesn't give you any guarantees about "at a given time" consistency, though. Maybe your queue or your event processor got backed up. So how real-time do you need both sides?

These are, of course, problems that people have solved to varying level of satisfaction for most use cases! But I've seen systems built by people who saw the blog post about "have both!" and then didn't think about all these cases and then... it gets messy.

Would it make sense to always write to a transactional queue first and then write to the database by reading sequentially from the queue?
If something is relying on the data in the queue to trigger a request to a service using the DB, the requestee may not have the full data yet.

The pattern I usually see is wiring this the other way - use an initial event as the trigger to write to the DB, while also streaming the DB's binlog or equivalent back as events. Now the risk is that another service gets a "too fresh" view but this is usually less harmful than a stale view. Things listening to the binlog events need to process them idempotently, but this is usually not a major complication since most queue designs will require you to be prepared for that anyway.

Typically you see these suggested as part of a "change data capture" type process where the event is only published after the action is committed to the data store. The downside (IMO) is this required directly integrating with the data store itself which isn't always easy to do, or obvious from a git/CICD perspective.