| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by pmjordan 5803 days ago

Thanks for the detailed explanation. Sounds like I was at least on the right track.

But if you ask me, the real solution is to have the kernel team fix their epoll implementation performance issues instead of forcing people to work around it with hybrid approaches.

That does indeed sound like a better conclusion.

Other than the stupid single-syscall-per-fd requirement, there's nothing in epoll's interface that would force it to perform worse than poll when the active/total ratio is high.

I don't see a reason why the syscall-per-fd couldn't easily be replaced/augmented with a single mass add/remove syscall which takes an array. The worse performance seems similarly baffling; it almost sounds as if they had some kind of inefficient data structure holding the file descriptor pool; considering poll() uses a flat array and epoll uses set operations I assume it's pretty tricky to make it perform well, even with a hash table. Maybe set operations aren't the best way to handle this data structure; but only some profiling in the kernel code can tell us that.

Obviously it'll take until 2.6.37 at least for any changes to enter the mainstream kernel, and until then a hybrid approach sounds sensible for those unwilling to patch. But still, fixing the root problem seems like a worthwhile cause.

1 comments

jacquesm 5803 days ago

I highly doubt that in production it will make any difference at all. In the end it is not the poll/epoll overhead that determines your overall throughput. If you call poll/epoll more frequently than you should then it starts to add up but in reality one call per several thousand file system operations doing real IO is not going to make a big difference.

Of course all the little bits help and I'm happy to see someone pay attention to detail like this but normally speaking you should get to the point where you're shifting data in real life situations and you can hook up a profiler to make the decision. You have less to blog about like that but the difference between poll and epoll is not large enough that you would spend more time going from the one to the other than was spent analysing this and writing the post.

Optimisations like this are best left to when you have things working, first make it work, then make it fast.

link

mfukar 5803 days ago

I keep seeing this...misconception? I don't know what to call it, over most comments. The article doesn't talk about optimizations. It talks about design decisions. Someone actually took the time to sit down, ponder what kind of workloads will be handled by his application, came across an interesting dilemma, measured and tested them both, and finally reached an educated conclusion.

That right there is how you correctly choose how to implement things, contrary to popular belief. Think first, write code later.

link

Confusion 5801 days ago

  Someone actually took the time to sit down, ponder what kind of
  workloads will be handled by his application

The most important point made in this thread is that Zed actually didn't do that, but did benchmarks for a range of workloads, suggesting he intends Mongrel2 to be able to handle all of them. The question is whether that is necessary. If ATR > 0.6 never happens in practice, it will only unnecessarily complicate Mongrel2.

link

prodigal_erik 5802 days ago

Writing the "which fds are ready" check twice would be crazy, if neither version were markedly more efficient than the other. Optimization is the only reason to make this particular design decision, and measurement is definitely the right way to go about it.

link

baq 5802 days ago

what good thinking gives you when your assumptions are wrong?

link

zedshaw 5803 days ago

You again? Man you got some beef.

Anyway, we do have something working. You can deploy Mongrel2 right now, and I do all the time. You can actually measure how fast it is, although that's going to be slow since we haven't done much tuning so kind of pointless.

So, again you're wrong. We are at a point in the design where this measurement and analysis matters, and we have something that actually works to put it in. You should probably maybe go do some actual reading instead of posting here like you know what you're talking about.

link

pmjordan 5803 days ago

Absolutely, although I can see how it makes sense to look into this stuff for Mongrel2 which seems to do little processing of its own, let alone disk I/O and mostly acts as a kind of demultiplexer.

link