Hacker News new | ask | show | jobs
by aviatorspoon 1697 days ago
Nice article. Tricky bug. Good diagnosis.

I enjoy diagnosing concurrency issues and thinking about what principles I could use to avoid similar issues in future.

Two principles that are relevant to this example are: 1. Reduce the number of places a lock is acquired. 2. Reduce the scope of locks.

(1) would suggest two potential improvements:

a. broadcastToStream shouldn't take the lock at "First, we check if we have a particular stream, if not, we return.". Leave the stream existence check to the go pool. This would have avoided the issue but could cause more contention on the GoPool in the case of unrecognised stream names, which might not be acceptable.

b. Don't acquire the lock in the go pool function within broadcastToStream. Rather, find the list of clients within broadcastToStream, and refer to that within the Go pool function. This also might not be acceptable, depending on hoe the lifetime of client and hub related: clients could receive messages after an unsubscribe.

(2) would suggest the fix you made.

That said, hindsight is 20:20 and I expect I'd make similar bug.