Hacker News new | ask | show | jobs
by numair 4077 days ago
I would be really interested to know how various forms of this bug are resolved. This seems like a problem that, on its surface, seems easy to fix, but isn't. Especially if you've designed your architecture for real-time-ness and global redundancy. Google's servers with atomic clocks come to mind...
1 comments

cynical answer: I've seen alot of races get "fixed" by adding a sleep() or similar

less cynical answer: Commonly you already have some kind of means to handle races - locking, transactions, some other variety of extra check - and the fix for newly discovered races is "oh, I didn't realise that could happen. add lock"

If you get three requests in at the same time, and sleep the tree for N (say, 400) miliseconds they'll all still run concurrently.

Adding a random time to sleep might work, but some requests would run noticeably slower.

Unless the code is doing read-write-read. If you're using a system that doesn't reflect writes immediately (like Elasticsearch), waiting after the writes can give time for the system to flush and make the other writes visible then you can execute rollback logic.

It'd be much better to make sure you're updating the same unique key and/or use the DB's conflict resolution system.