Hacker News new | ask | show | jobs
by zedshaw 5797 days ago
Again with this idea that Mongrel2 isn't working. You sir have no freaking idea what you're talking about.

"That's just plain wrong. Premature optimisation does not refer to having to measure before you optimise, it refers to optimising things that in practice may have little or no effect on the actual performance of the program."

No, that's just plain wrong. Premature optimisation is actually implementing something convoluted thinking it's optimized without knowing whether it actually is or not. It's voodoo cargo cult science. It's going against occam's razor.

There's nothing in Mongrel2 that's premature optimized. It's all very simple algorithms chosen for the right task, and later on I'll be testing them to see if they're still right. So your claim that this is premature optimization is just a buzzword and completely offensive. I took a long time to actually test my ideas before implementing them.

That's the total inverse of premature optimization.

What is total voodoo junk science is most of what you say. So far I haven't seen one set of data or any scientific experimental design or even a single testable hypothesis backing what you claim. It's all just rhetoric.

Until you've got hard numbers backing what you say, everything you're saying is inferior to what I'm doing: science.

2 comments

So, if it's working why not throw a load of real world traffic at it in stead of this 'science' that you're performing here ?

After all, that is where the rubber meets the road and it would be a very easy way to determine if your hunch is right or not.

Epoll was specifically created with that sort of workloads in mind, your 'surprising' conclusion is not rooted in the fact that epoll is somehow behaving in a way that is contrary to expectation, in fact it behaves exactly as it is designed to do.

Benchmarking it like this is nothing like the real world, and that's where epoll shines, not when you test it the way you just did.

As for the numbers, we're serving about 10Gbps continuously using a combination of varnish and java code to several million uniques daily, html, images, video. Poll over epoll is a run race, as far as I'm concerned you're wasting your time with this.

But by all means, ignore all this and do what you have to, those are the lessons learned best anyway, and it's your time, not mine.

If you feel like getting another view on this I'd suggest to contact the author of Varnish, he really knows his stuff and he might be able to convince you where I can not.

Because "real world traffic" is a bullshit test. Who's real world traffic? Mine? Yours? Google's? Yahoo's? Science is repeatable, which means you can replicate the same inputs to the test every time. The lab experiments involved in science are often highly sanitized and not "real world" at all. Determining what the results mean in the real world comes later.

If you're going to complain about science, at least understand how it works.

There are basically three sorts of traffic that I have experience with and would expect to be the major portion of whatever goes over the web:

- underwater ajax requests

- regular website content (images, dynamic html, css and other relatively small (say < 250K) files)

- media servers (filedumps, video servers, streaming audio servers)

Each of those requires fairly specific tuning of the TCP stack to get the most out of it, so you're not likely going to find all of these on one and the same machine unless it is a small operation (and in that case this whole discussion is moot).

A benchmark done in isolation is meaningless because in the end, real world traffic is what it is all about. So, I personally don't care whose site(s) you test with, as long as there are enough of them to get a statistically valid result.

Google's or Yahoo's would be fine with me, I've given my results above, if I have the time I'll do the same thing on a couple of other high volume sites.

I've (unfortunately) studied this problem quite a bit because of the size of the websites that I'm involved with and so far I've learned that you can play around on your testbench all day long it doesn't matter one little bit for production purposes unless you are very careful (such as in that other test linked from this page) to simulate users clients.

You could do a lot worse than to play back a log file in order to make an experiment repeatable. I assume that real world performance is what Zed is after, not theoretical performance.

You know what I find troubling about your behavior on this thread? It's just so weirdly manipulative. I gotta think you have like 40% stock invested in Epoll, Inc. or something. You make wild claims about what I'm saying that aren't true, you imply that I know nothing of real world performance when I've written some pretty bad ass real world software. You reply to every single thread with a constant stream of FUD and nitpicking everything you can then blowing it out of proportion.

I mean, are you sure you didn't used to work for Microsoft and then got hired by Linus to work your FUD spreading magic?

  Because "real world traffic" is a bullshit test [..]
The hell it is. A statistically significant sample of 'real world ...' is the foundation for most engineering decisions. When you build a bridge, you take the actual loads it has to support into account. Intel bases their chipbaking on the actual purity of the silicon their suppliers can provide.
Exactly, there's no concept of confounding at all. You use "real world tests" (whatever the hell that is) when you have an actual specific setup to test. You use a small model experiment like this to test one specific thing like poll vs. epoll.
I think what he's saying is that it should be pretty easy to change between poll, epoll, and "superpoll", as you can easily abstract a common interface. Then, you could worry about the performance of that particular bit only when you encounter it, and use your time arguably more effectively on actual feature needed to productionize Mongrel2 better.

I sort of agree with him, except with an important detail: this question is basically about prioritization of your time, and I'd say that this is nobody's business but yours! You can optimize memcpy() all day, for all I care. ;-)

There's one aspect where you'd be quite right to do some investigation before implementing: if the outcome changes the interface you'd need to implement.

For example, here's an hypothesis: using epoll's edge-triggered mode could drastically reduce the number of events (since you only get an event when an fd becomes readable/writable the first time, instead of every time it's in that state). Since epoll is O(N) on the number of events returned (not on the number of fds that are currently readable/writable), you'd lower the effective ATR a whole lot. In fact, a really busy server would have fewer events, since a readable fd would stay that way for longer if data is received at a great rate (the write-side story might be less brilliant). You'd also have to do much fewer calls to epoll_ctl, since you could just stop caring about the reading side while you're trying to write the last batch of data on the other side (no need to remove it from the interest set, you won't get events for it). You only need to set it when flipping from read to write, and the other way (after receiving/sending headers and bodies).

Now, if that's true, that's a big deal, because now you have to change your design a fair bit. You have to remember that an fd is readable until you get EAGAIN from read() yourself, so there's some more state management, moving that fd from one list to another, etc. Finding out that this would be a million times faster (or slower!) now would save you a ton of work, either way.

But finding whether poll or epoll is faster, or an hybrid solution with the same interface? Meh, it could wait.

(about my hypothesis, that's actually what Davide Libenzi designed epoll for, which might explain some of the weirder bits)