Hacker News new | ask | show | jobs
by mjpt777 4093 days ago
I understand the point that we need to make software robust. No single algorithm can cope with every possible failure mode. That is why we need multiple instances run on multiple nodes to be resilient. This is independent from if this is a wait-free algorithm.
1 comments

It is not wait-free if it can not handle a dying process, that is the whole point of being wait-free. The reason - again as far as I can tell as a non-expert - that wait-free algorithms are not widely used is that many of the best currently known implementations have such a large overhead that they perform worse than a good locking or lock-free implementation.
You keep saying you are not an expert but keep pushing the point :-)

Try this. A process could have a rogue thread scribbling on memory and thus corrupting it. Such a process then that has this thread, by your definition, cannot be wait-free because any data structure can be corrupted and cause algorithms to fail. This is not how the concurrency community look at this.

*added below

Even a simple LOCK XADD instruction could have its results corrupted by this rogue thread after it writes before a consumer reads it.

Agreed, corrupting the content of the address space was pushing the reasons for a failing thread a bit to far. But I only wanted to point out that any process can fail, i.e. make no further progress, at any time independent of the code it is executing. But lets just stick with killing or pausing the thread.

And I obviously don't have to be an expert in the field to know that wait-free means that the algorithm must be able to handle processes making no progress because that is, as mentioned before, the definition.