| The timeout must be much larger than the time required to do the work. The point is that distributed locks without a release mechanism are in practical terms very problematic. Btw, things to note in random order: 1. Check my comment under this blog post. The author had missed a fundamental point in how the algorithm works. Then he based the refusal of the algorithm on the remaining weaker points. 2. It is not true that you can't wait an approximately correct amount of time, with modern computers an APIs. GC pauses are bound and monotonic clocks work. These are acceptable assumptions. 3. To critique the auto release mechanism in-se, because you don't want to expose yourself to the fact that there is a potential race, is one thing. To critique the algorithm in front of its goals and its system model is another thing. 4. Over the years Redlock was used in a huge amount of use cases with success, because if you pick a timeout which is much larger than: A) the time to complete the task. B) the random pauses you can have in normal operating systems. Race conditions are very hard to trigger, and the other failures in the article were, AFAIK, never been observed. Of course if you have a super small timeout to auto release the lock, and the task may easily take this amount of time, you just committed a deisgn error, but that's not about Redlock. |
The critical point that users must understand is that it is impossible to guarantee that the RedLock client never holds its lease longer than the timeout. Compounding this problem is that the longer you make your timeout to minimize the likelihood of this from accidentally happening, the less responsive your system becomes during genuine client misbehaviour.