Hacker News new | ask | show | jobs
by banachtarski 4570 days ago
None of these numbers are significant! Give me something that tries hundreds if not thousands or tens of thousands of simultaneous requests. Then we have a real benchmark that will probably push a lot of these over the edge in terms of mean latency and especially tail/peak latency.
3 comments

There have been a group of us -- consistently pushing for exactly this. The maintainers of the benchmark are exceptional resistant to this idea... https://github.com/TechEmpower/FrameworkBenchmarks/issues/49 ... https://github.com/TechEmpower/FrameworkBenchmarks/issues/36 ... https://github.com/TechEmpower/FrameworkBenchmarks/issues/48 ... there are even more issues asking for concurrency increase, just search for concurrency.

It it silly that such an rich and awesome set of benchmarks never pushes on concurrency, one of the major points of failure "in the wild" -- more common as you become the go-between for your users and some set of APIs -- users stack up on one side, waiting connections stack up on the other.

There is a very simple reason for this: we do not yet have a test that is designed to include idling. One of the future test types [1], number 12 on the list, is designed to allow the request to idle while waiting on an external service.

Until we have such a test type, there is no value in exercising higher concurrency levels. Outside of a few frameworks that have systemic difficulty utilizing all available CPU cores, all tests are fully CPU saturated by the existing tests.

With that condition, additional concurrency would only stress-test servers' inbound request queue capacity and cause some with shorter queues to generate 500 responses. Even at our 256 concurrency (maximum for all but the plaintext test), many servers' request queues are tapped out and they cope with this by responding with 500s.

The existing tests are all about processing requests as quickly as possible and moving onto the next request. When we have a future test type that by design allows requests to idle for a period of time, higher concurrency levels will be necessary to fully saturate the CPU.

Presently, the Plaintext test spans to higher concurrency levels because the workload is utterly trivial and some frameworks are not CPU constrained at 256 concurrency on our i7 hardware. As for the EC2 instances, their much smaller CPU capacity means the higher-concurrency tests are fairly moot. If you switch to the data-table for Plaintext, you can see that the higher concurrency levels are roughly equivalent to 256 concurrency on EC2.

For example, jetty-servlet on EC2 m1.large:

      256 concurrency:  51,418
    1,024 concurrency:  44,615
    4,096 concurrency:  49,903
   16,384 concurrency:  50,117
The EC2 m1.large virtual CPU cores are saturated at all tested concurrency levels.

jetty-servlet on i7:

      256 concurrency: 320,543
    1,024 concurrency: 396,285
    4,096 concurrency: 432,456
   16,384 concurrency: 448,947
The i7 CPU cores are not saturated at 256 concurrency, and reach saturation at 16,384 concurrency.

We are not against high-concurrency tests; we are just not interested in high-concurrency tests where they would add no value. We're trying to find where the maximum capacity of frameworks is, not how frameworks behave after they reach maximum capacity. We know that they tend to send 500s after they reach maximum capacity. That's not very interesting.

All that said, once we have an environment set up that can do continuous running of the tests, I'll be more amenable to a wider variety of test variables (such as higher concurrency for already CPU-saturated test types) because the amount of time to execute a full run will no longer matter as much.

[1] https://github.com/TechEmpower/FrameworkBenchmarks/issues/13...

Don't get me wrong, I am only annoyed because of the wonderful job you guys do... it seems like such a glaring omission... because IMHO, it is where stuff often actually "falls apart" in real life... and is some of the most useful information you can possibly have.

The "trapped between APIs" scenario is one of the concurrency stressing ones, as is slow clients with large content, as is websockets. As you tests show, A LOT of frameworks do a damned fine job with serving lots of requests quickly -- I think concurrency is a far more interesting differentiator.

Glad to see that most of what I want is "on the list": 11, 12, 15, 19. Would be nice to see an additional "slow clients" test with large content -- where the limit is how fast the clients can receive server data... meaning, the limit on the server is how many clients they can stack up and handle concurrently.

Great! Please feel free to join in the discussion about future test types on the GitHub issue if you want!

Based on your comment and some others, I am presently thinking we'll want to bump up the priority of adding new tests in the upcoming rounds. Tentatively, getting the caching test in is low-hanging fruit and may be next up. But the external API test is probably next after that.

> Give me something that tries hundreds if not thousands or tens of thousands of simultaneous requests.

Yeah I can see that being more useful.

If the server is not flooded with concurrent requests and there are only 20 concurrent requests and then, put an file with a TCP socket in Python on it and it will do the job. They should all be long running at least at 10k concurrency.

Longer or even persistent (websocket) connections should be looked at. Hit them all with 20k connections, some very long lived. They don't have to come at the exact same microsecond, but they should come in pretty close and not do just a plaintext file read and close. They should be longer leaved. How about something as long as "validating your credit card" spinner some shopping websites make you wait for when you click "process payment" button. Then you don't know if you should refresh the page or if you do will you be double charged. That kind of stuff. Or say there is story written by pg talking about startups fighting NSA using Go hits HN and a flood of requests bring the server to its knees.

Why bother having nice benchmarks? What are they showing? CPU loading, so user can save money on compute time at Amazon, that's OK I guess. But it can be made more interesting.

No, then you have a benchmark that is useless for 99.999999999999999999% of people whose website does not get hundreds of simultaneous requests, much less tens of thousands.
Ugh your comment is so stupid I don't know where to begin. For those "99.999999999999999999% of people whose website does not get hundreds of simultaneous requests" you know what? They don't need a fucking benchmark at all. They could write their shit in BASIC and get the job done.