|
|
|
|
|
by posix_compliant
583 days ago
|
|
What's neat is that this is a differential equation. If you kill 5% of instances each hour, the reduction in bad instances is proportional to the current number of instances. i.e. if bad(t) = fraction of bad instances at time t and bad(0) = 0 then d(bad(t))/dt = -0.05 * bad(t) + 0.01 * (1 - bad(t)) so bad(t) = 0.166667 - 0.166667 e^(-0.06 t) Which looks a mighty lot like the graph of bad instances in the blog post. |
|
> We created a rule in our central monitoring and alerting system to randomly kill a few instances every 15 minutes. Every killed instance would be replaced with a healthy, fresh one.
It doesn't look like they worked out the numbers ahead of the time.