|
|
|
|
|
by mmartinson
2280 days ago
|
|
The sort of crash looping that took down their nodes is one of the parts of OTP that I’ve found to be both quite unintuitive and dangerous, for exactly the reasons described. There are controls for max retry attempts within a time window, but no obvious way from what I can tell to require a supervisor to halt rather than propagate a crash loop in a way that is easily auditable. This has caused me bad failures as well, and I’d be curious to hear what others are doing here. It’s not that it’s a hrs problem to solve, but that it’s an easy thing to overlook. |
|