|
|
|
|
|
by cgenuity
2922 days ago
|
|
It does not support recurring scheduling at the moment. Right now the retry logic is just one retry 5 seconds after the first failure. At which point the hook gets set to a failed status and failure notifications get sent out. Retries are tricky because depending on how the job is implemented they can cause more harm than good. So I plan to refine that more based on customer feedback. |
|
Ah, yes, there's nothing like bringing capacity back online only to have it crushed by all your customers retrying at the same time.
AWS got bit hard by this[3] but there's a blog post[1] about it, which is linked to by the docs for their client software[2].
[1] https://aws.amazon.com/blogs/architecture/exponential-backof...
[2] https://docs.aws.amazon.com/general/latest/gr/api-retries.ht...
[3] https://aws.amazon.com/message/5467D2/ ... basically DynamoDB is a fundamental service for AWS and had implemented some new streams features. This all appeared to be working, but they were running closer to capacity than intended, and when a cluster went out this caused a cascading failure.
And see this which linked to that RCA: https://blog.scalyr.com/2015/09/irreversible-failures-lesson...