Hacker News new | ask | show | jobs
by nitrogen 3616 days ago
Very cool. Consistent and clear retry, backoff, and failure behaviors are an important part of designing robust systems, so it's disappointing how uncommon they are. If I were starting a new Java project today I would almost certainly want to use this library instead of the various threads and timers I had to hack together years ago.
1 comments

Indeed this is conceptually hard stuff. The reason for that I believe is that the problems one is solving are system level problems and not local ones. Another way to look at this: It is the other guys problem. A lot of naive retry strategies sort of work until one has a larger number of clients to deal with. I still remember the time trying to get through to a base-station designer who refused to acknowledge the need to do exponential back-off and other mitigation steps. We ran into interesting times shortly later in the field on the management system side. Personally I would also put in a bit of randomness to spread out requests when all clients were initially impacted at the same time and were thus synchronized.
Good example of where random retry delays would be valuable. I filed this as a feature to add for the next release:

https://github.com/jhalterman/failsafe/issues/39