| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cgenuity 2922 days ago
	It does not support recurring scheduling at the moment. Right now the retry logic is just one retry 5 seconds after the first failure. At which point the hook gets set to a failed status and failure notifications get sent out. Retries are tricky because depending on how the job is implemented they can cause more harm than good. So I plan to refine that more based on customer feedback.

3 comments

ben509 2922 days ago

> Retries are tricky because depending on how the job is implemented they can cause more harm than good.

Ah, yes, there's nothing like bringing capacity back online only to have it crushed by all your customers retrying at the same time.

AWS got bit hard by this[3] but there's a blog post[1] about it, which is linked to by the docs for their client software[2].

[1] https://aws.amazon.com/blogs/architecture/exponential-backof...

[2] https://docs.aws.amazon.com/general/latest/gr/api-retries.ht...

[3] https://aws.amazon.com/message/5467D2/ ... basically DynamoDB is a fundamental service for AWS and had implemented some new streams features. This all appeared to be working, but they were running closer to capacity than intended, and when a cluster went out this caused a cascading failure.

And see this which linked to that RCA: https://blog.scalyr.com/2015/09/irreversible-failures-lesson...

link

nodesocket 2922 days ago

Seems like allowing the user to specify retry logic (if any) covers the "tricky" pieces. Let the user define the number of retries and delay between.

link

cgenuity 2922 days ago

Agreed, thank you :). Added to the board.

link

chaosprophet 2922 days ago

What happens when you have a service outage? Do you directly mark the hooks as failed, or do you retry once after your service has been restored?

link

cgenuity 2922 days ago

If the service outage is on Posthook's side, they would be retried.

If the outage is on the customer's side, all hooks that were attempted during the outage would be marked as failed. I plan on adding a feature that will allow the developer to fire off again all failed hooks in a given time period.

link