Hacker News new | ask | show | jobs
by zedshaw 5797 days ago
This is the point where talking about it does nothing. Go measure it like I have. In fact, I'll give you your hypothesis to test:

"There are no servers that have an ATR of > 80%."

That's easy to test, and I'm damn positive you could find some that disprove your assertion.

More importantly though, you have this assertion:

"Using both poll and epoll has no advantage in performance."

Again, who knows, that's why I'm testing and trying out. That's the science part, since I've got no idea, but I'll give it a shot. And now that I've done an analysis that tells me what really matters, I'll be able to do very good tests for the different kinds of loads.

Incidentally, when people run performance tests against web servers to see how fast they serve files they're testing the server with an ATR at around 100%. Food for thought.

1 comments

You have measured nothing about real world workloads. You have applied completely artificial benchmarks and formed some possible conclusions from them that mean absolutely nothing until you demonstrate that this is a real problem in the real world. There's ample evidence to believe that FDs spend the majority of their existence idle--between HTTP keepalive, processing time for the queries themselves, network bandwidth and the generally sparse nature of HTTP traffic, it is extremely plausible that the ATR, as you call it, is well below 60% almost all the time. Your numbers demonstrate that there is a tradeoff between epoll and poll, which is of some interest, but unless you actually measure the ATR of a real site you're shooting off your mouth.

I cannot collect this data because I don't possess a sufficiently high load web server. Go forth and measure, but measure useful information.

I used the same test everyone has used for the last 8 years to compare poll vs. epoll performance. I also ran it in order to test a hypothesis then assumed I was wrong then ran it more and presented the information openly so others could try it.

Of course I'm going to keep doing this, but if you say that my test is invalid, then all of the tests people did to justify epoll are invalid.

The tests are perfectly valid, it's the conclusion that is wrong. Of course there is a trade-off, of course you'll find a cross-over point because there is a trade off. The fact that the crossover point is at roughly 60% for your kernel config and machine setup is essentially meaningless, other than that it it at least falls within the expected range (50 to 80% or so).

Your hypothesis is that web servers have the majority of their fds active most of the time, and that's where the problem lies. I've put up the numbers elsewhere in this thread, feel free to do some measurement on your own high traffic websites to come up with more data points.

"Your hypothesis is that web servers have the majority of their fds active most of the time,"

Aha! Totally wrong. My hypothesis has not been that at all. You totally didn't even understand the hypothesis, and I stated it very clearly. My hypothesis has always been:

epoll is faster than poll when the active/total FD ratio is < 0.6, but poll is faster than epoll when the active/total ratio is > 0.6.

Nowhere in there do I say that all web servers have the majority of their FDs active. NOWHERE. I say some might, I say who knows, I say we need to go measure, but nowhere do I say anything like what you say.

  My hypothesis has always been:

  epoll is faster than poll when the active/total FD ratio is < 0.6, but poll is
  faster than epoll when the active/total ratio is > 0.6.
That wasn't your hypothesis: that was your intermediate conclusion after the first tests. Your hypothesis was: the common knowledge that epoll yields better performance, and that I should obviously use epoll for Mongrel2, is wrong. [1]

It's obvious he didn't mean 'hypothesis', but 'implicit assumption'. If you implement superpoll, you implicitly assume it will be useful. It will only be useful if actually deployed Mongrel2 servers will have an ATR > 0.6 at least some of the time.

[1] You can replace 'is' by 'may be', if you feel the strong version puts words in your mouth. It doesn't, because the hypothesis for an experiment may also be "The half life of protons is shorter than a trillion years", when I expect to reject that hypothesis. It's not an assertion of your opinion, but a statement of a fact you intend to accept or reject based on the outcome of the experiment.