Hacker News new | ask | show | jobs
by heisenbit 3620 days ago
Indeed this is conceptually hard stuff. The reason for that I believe is that the problems one is solving are system level problems and not local ones. Another way to look at this: It is the other guys problem. A lot of naive retry strategies sort of work until one has a larger number of clients to deal with. I still remember the time trying to get through to a base-station designer who refused to acknowledge the need to do exponential back-off and other mitigation steps. We ran into interesting times shortly later in the field on the management system side. Personally I would also put in a bit of randomness to spread out requests when all clients were initially impacted at the same time and were thus synchronized.
1 comments

Good example of where random retry delays would be valuable. I filed this as a feature to add for the next release:

https://github.com/jhalterman/failsafe/issues/39