Hacker News new | ask | show | jobs
by zedshaw 5796 days ago
Yeah, 'cause there's no way I'll be able to test a real web server that I actually wrote based on this small test. This is a small test to test one specific thing, doing more would confound the test. Confounding. Look it up.

Incidentally, this is the same test everyone else uses, so if you thought it was bullshit why did you support it when people testing epoll with it were using it? Oh, because they used it to confirm your bias rather than disagree with it.

2 comments

I think there's less disagreement about the well-controlled result derived and more about whether

   a) Your controls were right, and
   b) What the most optimal decision is in light of this new information.
(a) is well known to be one of the most difficult parts of scientific reasoning and is almost always open to endless debate and improvement. In short, it's the question of whether ATR is a human-sensible metric. (b) however has an interesting direct answer: figure out the distribution of "live" ATRs on an interesting population of real servers and then, to borrow Eliezer's phrase, shut up and multiply.

If a lot of servers that you're targeting with M2 fall across that 60% divide (under circumstances similar to your controlled microbenchmark) then of course Superpoll is a good compromise.

Jacques is arguing a combination of (a) and (b). Perhaps ATR is not a sufficient metric to understand all interesting server loads. Moreover, perhaps many interesting servers live at really low or high ATRs all the time and so Superpoll must gracefully degrade to either poll or epoll.

In any case, driving for empirical data is noble, but possessing data is never sufficient to whitewall all detractors. It's really nice to have strong empirical support for the breakeven point between the two (ie. the ratio of their constant time components) via your benchmark, but science isn't just statistics.

(edit: I'll also add that pushing the pipetest microbenchmark past where people are usually making hyperbolic claims is a pretty big deal and a good catch.)

I think he was suggesting that you find the ATR on real-world production webservers, which has nothing to do with your test and would not confound it. Maybe in practice nobody sees an ATR > 30%, in which case your test, if correct, is irrelevant. What sort of loads produce an ATR > 60%? How common are they?

One problem with getting such numbers is that real world, the load is given, not the ATR, and the ATR for a given load may depend on implementation choices. Total connections is a function of server latency, and on a loaded system, latency can be a function of polling method. So switching methods may change your ATR.

On the other hand maybe ATR for a given load doesn't change significantly depending on implementation. Your test found what is best for a given ATR, but not necessarily for a given load, or how ATR depends on load for a given implementation. Depending on the results, you may want to add some hysteresis to superpoll.