Hacker News new | ask | show | jobs
by onetoo 880 days ago
edit: I did some more investigation, and I missed something crucial: The distribution of requests over time becomes very wonky if the lower bound isn't 0. It stabilizes given enough time, but that time seems very long. Whereas lower-bound 0 quickly becomes uniform.

See the following figure, where I plot the # of requests over type: https://i.imgur.com/PNUFhjc.png

And that is why you should use 0.

---

I am not an expert, but I got nerd-sniped.

I have made some simulations[1] and done some napkin math[2], and I would summarize as follows:

I don't think there is any globally optimal, I think it depends on your exact circumstances, and which of {excess waiting time, excess load} you want to optimize. Defaulting to 0 is a lot easier to implement, and is at most a factor 2 worse than the optimal lower bound w.r.t. waiting time vs. load. You may also consider [0.5 * maximum, maximum], which trades some excess waiting time for less load. Your suggestion is a similar heuristic one might use depending on their exact circumstances.

[1] https://gist.github.com/Dryvnt/1984d9389ae7386127f5e8998bf52...

[2] Consider a family of random bounded exponential back-off strategies with ultimate upper bound U as follows:

  Strategy X(z): Random of [z * U, U], where z is constant, 0 <= z <= 1
There are other families of back-off algorithms with other characteristics, and I am not considering or comparing those, just this family. Note that strategy OP suggests is X(0). Consider that X(1) is non-random, which is undesirable.

Simple probability tells us

  avg(X(z)) = (1 + z) * U / 2
Server load ~= frequency, frequency is inverse duration, so

  load(X(z)) ~= 1 / avg(X(z)) = 2 / (1 + z) * U
The difference in load between any two X strategies is

  load(X(z1)) / load(X(z2)) = (1 + z1) / (1 + z2)
Since z1 and z2 are constant, this relative load is constant. Due to the bounds on z, the largest possible relative load is

  load(X(1)) / load(X(0)) = 2 / 1 = 2
Now consider your suggested strategy

  Strategy Y: Random of [L, U], where L is the last choice
Note that

  Y = X(y), where y = L / U.
In fact, Y approaches X(1) exponentially fast, since the difference between L and T is halved each step, on average. So your suggestion still falls within this at-most-factor-2 difference. Exactly where just depends on the outage length.
1 comments

Yeah, all fair. What piqued my curiosity is that you could wait _less_ time than you did before (potentially not at all!) which feels like the opposite of what you want to do in such situations.
Indeed, it was a good question. That's why I couldn't let it go either. I personally find the result quite interesting. Counterintuitive, but the most uniformly distributed load is probably the most likely to recover.

As always, https://xkcd.com/356/