| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by fafle 1710 days ago
	The issue of running a transaction that spans multiple heterogeneous systems is usually solved with a 2 phase commit. The "jobs" abstraction from the article looks similar to the "coordinator" in 2PC. The article does not talk about how they achieve fault tolerance in case the "job" crashes inbetween the two transactions. Postgres supports the XA standard, which might help with this. Kafka does not support it.

3 comments

jgraettinger1 1710 days ago

I can't speak to their solution, but when solving an equivalent problem within Gazette, where you desire a distributed transaction that includes both a) published downstream messages, and b) state mutations in a DB, the solution is to 1) write downstream messages marked as pending a future ACK, and 2) encode the ACK you _intend_ to write into the checkpoint itself.

Commit the checkpoint alongside state mutations in a single store transaction. Only then do you publish ACKs to all of the downstream streams.

Of course, you can fail immediately after commit but before you get around to publishing all of those ACKS. So, on recovery, the first thing a task assignment does is publish (or re-publish) the ACKs encoded in the recovered checkpoint. This will either 1) provide a first notification that a commit occurred, or 2) be an effective no-op because the ACK was already observed, or 3) roll-back pending messages of a partial, failed transaction.

More details: https://gazette.readthedocs.io/en/latest/architecture-exactl...

link

eternalban 1710 days ago

IIRC ~2 decades ago we were dequeueing from JMS, updating RDBMS, and then enqueuing all under the cover of JTA (Java Transaction API) for atomic ops.

https://docs.oracle.com/en/middleware/fusion-middleware/12.2...

Using a very broad definition of ‘noSQL’ approach that would include solutions like Kafka, the issue becomes clear: A 2PC or ‘distributed transaction manager’ approach ala JTA comes with a performance/scalability cost — arguably a non-issue for most companies who don’t operate at LinkedIn scale (where Kafka was created).

link

zeckalpha 1710 days ago

And MySQL didn’t yet support transactions!

link

redis_mlc 1709 days ago

Actually, Innodb, which supports transactions, has been bundled with MySQL since 2001, but existed before then. It became the default storage engine in 2010.

link

zeckalpha 1709 days ago

> IIRC ~2 decades ago

link

nitwit005 1710 days ago

The solution I've seen is to write the message you want to send to the DB along with the transaction, and have some separate thread that tries to send the messages to Kafka.

Although, from various code bases I've seen, a lot of people just don't seem to worry about the possibility of data loss.

link