|
|
|
|
|
by lclarkmichalek
944 days ago
|
|
This still isn't what I'd call "safe". Retries are amazing at supporting clients in handling temporary issues, but horrible for helping them deal with consistently overloaded servers. While jitter & exponential backoff help with the timing, they don't reduce the overall load sent to the service. The next step is usually local circuit breakers. The two easiest to implement are terminating the request if the error rate to the service over the last <window> is greater than x%, and terminating the request (or disabling retries) if the % of requests that are retries over the last <window> is greater than x%. i.e. don't bother sending a request if 70% of requests have errored in the last minute, and don't bother retrying if 50% of the requests we've sent in the last minute have already been retries. Google SRE book describes lots of other basic techniques to make retries safe. |
|