| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tomjohnson3 4829 days ago

interesting FAQ...i like the idea of bringing this info together.

i've found there are lots of more-common things that cause partitions in practice than equipment-in-the-middle failures. human errors are probably the biggest: network configuration changes, fresh bugs in your own software - or in your dependencies, etc.

also, while a network might be asynchronous, there's usually a limit to how long a message can be delayed in practice. ...the limit might be how much memory you have to queue up messages...or perhaps how long your client-side software (or your end-user) is willing to wait for a message when a dialog is more complex than request/response.

when designing distributed software, i've found that it's helpful to ask: when (not if) X process/server/cluster/data-center fails or becomes unreachable - temporarily or forever - how should the rest of my system respond?

so, perhaps the most important take-away from the FAQ for designers is #13: that C and A are "spectrums" that you tune to meet your own requirements when the various failure scenarios happen.