|
|
|
|
|
by rewmie
944 days ago
|
|
> You would think by now that the various frameworks for remote calls would have standardized down to include the best practice retry patterns, with standard names, setting ranges, etc. There is a school of thought that argues that the best retry pattern is no retry at all, and just get the client to fail and handle that state. One of the driving arguments is that retries are a lazy way to try to move faults from the client onto the server, and in the process cause more harm (i.e., DDoS). Sometimes complex means wrong, and all these retry strategies are getting progressively more complex at the expense of hammering servers with traffic way beyond the volume it's designed to handle. How is that a decent tradeoff? |
|
Some failures really are random, let's say 0.1% of requests fail. For a sufficiently complex backend/operation, one user request can easily generate 100 internal requests that can fail. If you don't retry, this adds up to a non-negliglible chance that a whole user facing operation fails and all 100 requests have to be retried - you actually increased the number of requests that had to be made! As an extreme example, imagine that during training ChatGPT one request failed, and whole training has to be started from scratch because we don't do retries.