Hacker News new | ask | show | jobs
by nkozyra 4413 days ago
It seems like a timeout is less reliable/safe than some broadcast/ping mechanism that can check availability perpetually and if a node has disappeared the validity of the lock changes.

Trying to remember which distributed system model it is that sort of does this. Ring? Mesh?

2 comments

For such a model to work I believe you need a distributed replicated state machine, and the clients to be an active part of the distributed system (not just participating doing requests), being able to reply to pings. Yes, there is a safety advantage in the model you describe, as if the time taken to finish with an operation is larger than expected, the other clients may want to wait more, but in the practice:

1) What you do if the client replies to pings but takes an apparently never ending time to perform the operation on the shared resource?

2) What about if the client is correctly operating on the shared resource but the only component which is failing is the system you use to check its availability?

1) I think a mix of the two approaches would work here - an actual timeout to a lock but not the only way of keeping a lock

2) I suppose that's always possible, but then what would happen is the lock would be released. Not ideal behavior but also not one that presents a data reliability issue.

That could increase throughput if you have a lot of crashing nodes but how does it improve safety or reliability?
Safety? It doesn't. Reliability? because it would prevent another node from acquiring a held lock if a server is available and release a lock if a server goes down.

There would never be a dirty read possible.