Hacker News new | ask | show | jobs
by Etheryte 1464 days ago
If I understood correctly, the problem you're describing is essentially the following. The locks have a timeout to them to avoid zombie locks etc. However this means we can have system A obtain a lock (or a lease, if you prefer) and begin a long-running process on the given resource. However, during the long-running process, the lock times out and system B can acquire a lock, thinking it's the only one using the resource, leading to a conflict. Did I understand you correctly?
2 comments

We once had a bug from this incorrect use of "distributed locks". A server we accessed under a lock suddenly started lagging past the timeout of the lock, another server using the lock assumed the lock was released (i.e. timed out) and acquired it, while the original server assumed it still owned the lock. Data corruption occurred.

This implementation has "heartbeats" so I wonder whether it solves the problem.

Heartbeats does not solve it. You need fencing tokens to reject writes if the lock has expired.

See this amazing article by Martin Kleppman, author of Designing Data-Intensive Applications.

"How to do distributed locking"

https://martin.kleppmann.com/2016/02/08/how-to-do-distribute...

It's really up to the client implementation.

In order to deal with long-running processes, the client Python implementation uses a separate thread for sending periodic heartbeats to the lockable server, which serves to do 2 things:

  1. renew the lease so it doesn't expire which would release the lock
  2. notify the main worker thread in the even the lock has been lost
The GP's point was that the heartbeat thread can hang in pathological cases, which means the main worker thread would not be notified that it has lost the lock.

This can be addressed in a few ways - one way being by adding fencing tokens[0]. However, that requires modifying the underlying resource you are accessing.

[0]: https://ebrary.net/64834/computer_science/fencing_tokens

What's the advantage of heartbeats over a simpler implementation via SETNX in Redis if you still need fencing tokens?