Hacker News new | ask | show | jobs
by danbruc 4097 days ago
Two threads want to send a single block message. Writer one advances the tail pointer by 100 bytes and then dies. Writer two then advances the tail pointer by another 100 bytes, copies its message into the buffer and writes its header. The reader will see the tail pointer 200 bytes past the last processed message but no valid header and this looks exactly like a writer currently in the process of placing a 200 byte message - including the header - into the buffer. Looks to me like thread one blocked thread two indefinitely because the message of thread two is never added to the buffer. Yes, the bytes are in the buffer, but only physically. The reader can not see them, i.e. the write operation did not logically complete.
1 comments

If you look at the code the reader does not read the tail. It skips forward reading the length fields. It would never see the writer two message and it never blocks. It just thinks there are no new messages.
Writer two is blocked, not the reader. Writer two wants to add a message to the buffer but never succeeds because writer one fails to make the message of writer two visible to the reader.
Writer two is not blocked, it is not waiting, it is not being obstructed. It is wait-free. It returns and can do whatever work it wishes.

However the data structure would be corrupt if writer one never completed. This is corruption and not the definition of wait-free.

No, it is not wait-free. The operation of writer two does not even succeed. Some bytes get written to the buffer but the message is never really added to the log because to call a write a success it should at least become visible to the reader. Writer two returns from the write operation without noticing that the operation failed. To make sure that the write succeeded writer two would have to wait and verify that its message actually becomes visible to the reader.

You can even go a step further and make writer two block. Just assume that the next thing it does is waiting for a response from the receiver of its message. Now because writer one failed the message never reaches the receiver, the receiver never sends a response and the writer waits indefinitely for the response, i.e. is blocked. Now the failed writer one is really and the only reason writer two is blocked, even though only indirectly.

And you can not really call it a corruption of the data structure because it is the exact same stated that occurs during normal operation. If writer one would resume execution after two weeks suddenly the corruption would be gone.