Hacker News new | ask | show | jobs
by jlokier 2418 days ago
If I understood correctly, the extra round trip on the reader only occurs with left-over intents, which are the product of an earlier failure.

So:

- Writes are faster due to fewer round trips in normal running.

- Reads are the same speed in normal running.

- After coordinator failure events, the new coordinator starts to clean up left-over intents asynchronously (nothing specific is waiting for it to finish).

- Reads are slowed by extra round trips in the short time after a failure event, until the left-over intents are cleaned up. But only in that time period, and only for those ranges touched by transactions during the failure event.

- The cost of extra round trips done by the reader just during recovery is much less important than the round trips that happened on every cross-region write with the older algorithm.

- But if you care about read latency being consistent all the time, including during recovery from coordinator failure events, maybe you need a more sophisticated high-availability configuration for the logical coordinator.

1 comments

> If I understood correctly, the extra round trip on the reader only occurs with left-over intents, which are the product of an earlier failure.

Left over intents are also visible for an ongoing txn, before those intents are resolved. But there's no added latency in the read path, I've commented elsewhere in the thread to explain how.

> After coordinator failure events, the new coordinator starts to clean up left-over intents asynchronously (nothing specific is waiting for it to finish).

The cleanup happens by readers on demand, there's no separate global coordinator scanning the keyspace and resolving old write intents.

> Left over intents are also visible for an ongoing txn, before those intents are resolved. But there's no added latency in the read path, I've commented elsewhere in the thread to explain how.

Thanks, that was a helpful comment.

> The cleanup happens by readers on demand, there's no separate global coordinator scanning the keyspace and resolving old write intents.

Oh, that's a little surprising. I assumed the coordinator(s) did so because asynchronous cleanup is mentioned numerous times in the article, but upon closer scrutiny I see now that it only applies in the after phase of transactions without a failure.

Would that scanning, analogous to RAID "resilvering" subject to write-intent ranges to limit the keyspace regions scanned, usefully improve read latencies later?

> I see now that it only applies in the after phase of transactions without a failure.

Yep.

> Would that scanning, analogous to RAID "resilvering" subject to write-intent ranges to limit the keyspace regions scanned, usefully improve read latencies later?

I think it's just a better design to have it done on demand. The keyspace is large and failures are rare, and when one of these zombie intents are happened upon, the very first reader addressing it resolves it for all subsequent readers. A global scan would improve read latencies later, but not by much and not for many readers.