Hacker News new | ask | show | jobs
by jfager 5797 days ago
It is worth pointing out that the original epoll benchmarks were focused on how performance scaled with the number of dead connections, not performance in general:

http://www.xmailserver.org/linux-patches/nio-improve.html

And as jacquesm points out, in a web-facing server, that's the case you should care about. A 15-20% performance hit in a situation a web-facing server is never going to see doesn't matter when you consider that the 'faster' method is 80% slower (or worse) in lots of real world scenarios.

I'll be interested to see how the superpoll approach ends up working, but my first impression is 'more complexity, not much more benefit'.

1 comments

> And as jacquesm points out, in a web-facing server, that's the case you should care about.

Yes, but where's the evidence what people see for active/total ratios in the real world? I'm showing that unless it's below about 60% (probably more like 50%) then poll is the way to go.

60% active isn't entirely unrealistic at all. I can see quite a few servers hitting those thresholds, so in that cases, poll vs. epoll doesn't matter.

I think what's more important in what I'm finding is that you really need both. It's entirely possible that you have servers that are at 80-90% ATR all the time. Others that are 10% ATR. The key is either you have to measure that, which nobody does, or you have to make a server that can adapt.

> but where's the evidence what people see for active/total ratios in the real world?

Yes Zed, where the fuck is it? You're claiming SCIENCE! based on your worst-case synthetic localhost benchmarks, and then turning around and wildly guessing as to real-world performance characteristics with internet latencies.

Worse, your whole thesis hinges off of ATR but you made no effort to measure it anywhere, instead you're passive-aggressively berating us to do it.

Wow here we are again, you not reading my article. I ran the same test that everyone else runs for poll vs. epoll, then used R to craft graphs and tested hypothesis. It was not a localhost test.

So far all you've got is trolling HN comments. YOU WIN!

Pipes, localhost, who is counting, as far as I'm concerned that's the same thing, making it seem as if for the purpose of this test that's a significant difference is simply conversational trickery.

If you have tested this on real live servers then there is no evidence of that in your posting, and to suggest that this:

http://dpaste.de/32o8/

is anything but a localhost test is simply bogus.

The only use case where you may be right that poll is advantageous as far as I can see is streaming media servers (video, audio, other large files), image servers are the ones with the worst active-to-total ratios, especially if the images are small. I should know, I only serve up a few billion of them every day. A few years ago or so I was stupid enough to think that video was hard, man was I ever wrong. Repeated connections to the same host, that's a much bigger killer than pumping bits.

Testing this on real live servers is confounding. Man you guys really don't get this. If you want to test epoll and poll over file descriptors you test that. You don't test a billion other things in a network server. That confounds your results.

But what's really amazing is this is the test the proponents epoll have been using for 8 years. Where was your objection back when they were using it for that?

That you didn't say anything about the machines you ran it on was a huge red flag. It looks like jacquesm has you nailed on the localhost testing, I hadn't noticed the link to the code before.

I'll trust that you accurately measured the ATR boundary between poll and epoll in your specific synthetic benchmark. That's then completely undermined by your handwaving in this thread about what ATR looks like in the wild, and the lack of any way for us to relate your microbenchmark with the real world.

The original article had the link to the code so don't even try to imply that I was hiding jack squat. He hasn't nailed shit, it's the same test epoll proponents have used for years. It's not a "localhost" test, it's a test that makes a bunch of file descriptors and then compares the poll vs. epoll performance as active vs. total changes.

But hey, you can live in your own little fantasy world where you think you've won some kind of battle of the HN because you listened to some epoll fanatic weirdo and cheered him on.

It's entirely possible that you have servers that are at 80-90% ATR all the time

I'd be curious if you have any evidence that this occurs in practice. Even a busy server with clients of uniform + low latency, intuitively I'd expect fairly low ATRs.

I think what's more important in what I'm finding is that you really need both.

I'm not sure you do: the performance advantage of poll seems marginal at best. When ATR is high, you're presumably doing enough real work that the slight overhead of epoll vs. poll is probably not super important.

This is the point where talking about it does nothing. Go measure it like I have. In fact, I'll give you your hypothesis to test:

"There are no servers that have an ATR of > 80%."

That's easy to test, and I'm damn positive you could find some that disprove your assertion.

More importantly though, you have this assertion:

"Using both poll and epoll has no advantage in performance."

Again, who knows, that's why I'm testing and trying out. That's the science part, since I've got no idea, but I'll give it a shot. And now that I've done an analysis that tells me what really matters, I'll be able to do very good tests for the different kinds of loads.

Incidentally, when people run performance tests against web servers to see how fast they serve files they're testing the server with an ATR at around 100%. Food for thought.

You have measured nothing about real world workloads. You have applied completely artificial benchmarks and formed some possible conclusions from them that mean absolutely nothing until you demonstrate that this is a real problem in the real world. There's ample evidence to believe that FDs spend the majority of their existence idle--between HTTP keepalive, processing time for the queries themselves, network bandwidth and the generally sparse nature of HTTP traffic, it is extremely plausible that the ATR, as you call it, is well below 60% almost all the time. Your numbers demonstrate that there is a tradeoff between epoll and poll, which is of some interest, but unless you actually measure the ATR of a real site you're shooting off your mouth.

I cannot collect this data because I don't possess a sufficiently high load web server. Go forth and measure, but measure useful information.

I used the same test everyone has used for the last 8 years to compare poll vs. epoll performance. I also ran it in order to test a hypothesis then assumed I was wrong then ran it more and presented the information openly so others could try it.

Of course I'm going to keep doing this, but if you say that my test is invalid, then all of the tests people did to justify epoll are invalid.

The tests are perfectly valid, it's the conclusion that is wrong. Of course there is a trade-off, of course you'll find a cross-over point because there is a trade off. The fact that the crossover point is at roughly 60% for your kernel config and machine setup is essentially meaningless, other than that it it at least falls within the expected range (50 to 80% or so).

Your hypothesis is that web servers have the majority of their fds active most of the time, and that's where the problem lies. I've put up the numbers elsewhere in this thread, feel free to do some measurement on your own high traffic websites to come up with more data points.

IE will keep a connection open for about 60 seconds, how much of that's going to be active? I don't have the exact number, and of course it will vary, but of course it's going to be far less than 60% in the vast majority of cases.

If a site gets spiked with the typical 'read-and-leave' traffic a link from reddit or huffpo or wherever generates, how does superpoll compare to straight epoll? Based on your description so far, I can only see it hurting - you're not just wasting time on dead connections in your poll bin, you're now also incurring the overhead of managing the migration over to the epoll bin.