Hacker News new | ask | show | jobs
by jpollock 3 days ago
This matching up with sliced data, either sliced by recipient, or sliced by sender, so it makes sense to have primary state with cool backups. Particularly if the state is race-tolerant.

As long as event orderings are unimportant, or self resolvable this works really well.

e.g. if Events A,B arrive, but A+B => C and B+A => C, then as long as you durably record A, B, the end state is the same.

I'm not sure why "reroute" is a message instead of a response, I would expect it to be a failure response pruning the control flow. With the GTR being "default retry".

There's a lot of learned experience in that doc. Reading between the lines, both the logging and configuration systems have caused global outages (or near misses). Nifty to read.

This style of architecture fits with a "no global changes" and "never lose it all" approach to fault tolerance, accepting that there will be visits from "Mr. Cock-up" [1].

Very nice writeup.

[1] https://www.youtube.com/watch?v=D5r8xwu0l8w