|
|
|
|
|
by spotman
3835 days ago
|
|
> The trade-off is that in the case of failure (node failure or network partitions), operations may block until the failures have been resolved. Curious how this works at scale. For example if a node is down and N requests start blocking, is there an upper bound to N? What happens when this is reached? Does N map to a goroutine? A file descriptor? Is there timeouts? Seems like possibly a node going down essentially means the entire show going down if requests are filling up and not being resolved, you would think there may be a tipping point in which other requests may not be able to be served as all resources are blocked. Possibly there is just a sane timeout and these requests also just fail, leaving the client to know something is amiss? |
|
> Curious how this works at scale. For example if a node is down and N requests start blocking, is there an upper bound to N? What happens when this is reached? Does N map to a goroutine? A file descriptor? Is there timeouts?
The blocking actually occurs on a per-client basis: when a client submits a txn to the server to which it's connected, that server will calculate which other server nodes need to be contacted in order for the txn to be voted on. If it cannot find enough servers due to failures then it will outright reject the txn and tell the client that there are currently too many failures going on.
However, the server could find the right set of server nodes and send off the txn to be voted on and send of the txn for verification and validation. Right at that point there could be further failures and so it's not known how far those messages got. It is in this case that the txn could block as until some node recover it's impossible to safely abort the txn. But it's important to note that:
1. If a txn starts blocking, that'll block just that txn and that client (as the client is blocked waiting on the outcome of the txn). No other part of the server or any other server is affected.
2. Other clients can be submitting other txns at the same time that need different un-failed nodes to proceed, and they progress just fine.
3. There is no global determination (yet) of whether a node is failed or not. Thus this approach can cope with really weird network issues - eg random connections between different nodes failing. There is no need for the whole remaining cluster to agree that nodes X, Y and Z are "out" - none of the algorithms depend on that sort of thing.
I hope this answers your questions.