| In real-life web serving situations, and not in benchmarks, the majority of the fds is not active. It's the slow guys that kill you. A client on a fast connection will come in and will pull the data as fast as the server can spit it out, keeping the process and the buffers occupied for the minimum amount of wall clock time and the number of times the 'poll' cycle is done is very small. But the slowpokes, the ones on dial up and on congested lines will get you every time. They keep the processes busy far longer than you'd want and you have to hit the 'poll' cycle far more frequently, first to see if they've finally completed sending you a request, then to see if they've finally received the last little bit of data that you sent them. The impact of this is very easy to underestimate, and if you're benchmarking web servers for real world conditions you could do a lot worse than to run a test across a line that is congested on purpose. |
> the ones on dial up and on congested lines will get you every time.
Do you have numbers on the dial-up users for your server? My understanding is that there's far fewer, so this is bogus. Show evidence of high dial-up penetration first.
> They keep the processes busy far longer than you'd want and you have to hit the 'poll' cycle far more frequently
Again, you have no numbers on the active/total ratio in your server, so unless you do this statement doesn't refute what I found. I've presented evidence that just shows the math of O(N=active) / O(N=total) holds up. Simple math. The only way epoll wins for all load types is if it is as fast as poll all the time. My tests show it's not, which stands to reason since it's implemented using more syscalls than poll.
> The impact of this is very easy to underestimate, and if you're benchmarking web servers for real world conditions you could do a lot worse than to run a test across a line that is congested on purpose.
Again, you have no definition of "congestion". If you adopt a simple metric like ATR then we can talk. As it is, you (and everyone else) just throws around latency numbers like those matter when really the performance break is in the ATR. In addition, my numbers show the performance break being at about 60% ATR, so if you're saying that no server every goes above 60% activity levels then you're totally wrong. 60% is not completely unreasonable on a loaded server.
But, I think you're missing a key point: You need both in a server like Mongrel2. I never said epoll sucks and poll rocks (since you probably didn't read the article). I said something very exact and measurable:
> epoll is faster than poll when the active/total FD ratio is < 0.6, but poll is faster than epoll when the active/total ratio is > 0.6.
If you don't think that's the case in "the real world" then go measure it and report back. That's the science part. I totally don't believe it yet myself, which is why I'm measuring it and showing the methods to everyone so they can confirm it for me.