| HN Mirror

Indeed this is conceptually hard stuff. The reason for that I believe is that the problems one is solving are system level problems and not local ones. Another way to look at this: It is the other guys problem. A lot of naive retry strategies sort of work until one has a larger number of clients to deal with. I still remember the time trying to get through to a base-station designer who refused to acknowledge the need to do exponential back-off and other mitigation steps. We ran into interesting times shortly later in the field on the management system side. Personally I would also put in a bit of randomness to spread out requests when all clients were initially impacted at the same time and were thus synchronized.