Hacker News new | ask | show | jobs
by MetaCosm 4572 days ago
There have been a group of us -- consistently pushing for exactly this. The maintainers of the benchmark are exceptional resistant to this idea... https://github.com/TechEmpower/FrameworkBenchmarks/issues/49 ... https://github.com/TechEmpower/FrameworkBenchmarks/issues/36 ... https://github.com/TechEmpower/FrameworkBenchmarks/issues/48 ... there are even more issues asking for concurrency increase, just search for concurrency.

It it silly that such an rich and awesome set of benchmarks never pushes on concurrency, one of the major points of failure "in the wild" -- more common as you become the go-between for your users and some set of APIs -- users stack up on one side, waiting connections stack up on the other.

1 comments

There is a very simple reason for this: we do not yet have a test that is designed to include idling. One of the future test types [1], number 12 on the list, is designed to allow the request to idle while waiting on an external service.

Until we have such a test type, there is no value in exercising higher concurrency levels. Outside of a few frameworks that have systemic difficulty utilizing all available CPU cores, all tests are fully CPU saturated by the existing tests.

With that condition, additional concurrency would only stress-test servers' inbound request queue capacity and cause some with shorter queues to generate 500 responses. Even at our 256 concurrency (maximum for all but the plaintext test), many servers' request queues are tapped out and they cope with this by responding with 500s.

The existing tests are all about processing requests as quickly as possible and moving onto the next request. When we have a future test type that by design allows requests to idle for a period of time, higher concurrency levels will be necessary to fully saturate the CPU.

Presently, the Plaintext test spans to higher concurrency levels because the workload is utterly trivial and some frameworks are not CPU constrained at 256 concurrency on our i7 hardware. As for the EC2 instances, their much smaller CPU capacity means the higher-concurrency tests are fairly moot. If you switch to the data-table for Plaintext, you can see that the higher concurrency levels are roughly equivalent to 256 concurrency on EC2.

For example, jetty-servlet on EC2 m1.large:

      256 concurrency:  51,418
    1,024 concurrency:  44,615
    4,096 concurrency:  49,903
   16,384 concurrency:  50,117
The EC2 m1.large virtual CPU cores are saturated at all tested concurrency levels.

jetty-servlet on i7:

      256 concurrency: 320,543
    1,024 concurrency: 396,285
    4,096 concurrency: 432,456
   16,384 concurrency: 448,947
The i7 CPU cores are not saturated at 256 concurrency, and reach saturation at 16,384 concurrency.

We are not against high-concurrency tests; we are just not interested in high-concurrency tests where they would add no value. We're trying to find where the maximum capacity of frameworks is, not how frameworks behave after they reach maximum capacity. We know that they tend to send 500s after they reach maximum capacity. That's not very interesting.

All that said, once we have an environment set up that can do continuous running of the tests, I'll be more amenable to a wider variety of test variables (such as higher concurrency for already CPU-saturated test types) because the amount of time to execute a full run will no longer matter as much.

[1] https://github.com/TechEmpower/FrameworkBenchmarks/issues/13...

Don't get me wrong, I am only annoyed because of the wonderful job you guys do... it seems like such a glaring omission... because IMHO, it is where stuff often actually "falls apart" in real life... and is some of the most useful information you can possibly have.

The "trapped between APIs" scenario is one of the concurrency stressing ones, as is slow clients with large content, as is websockets. As you tests show, A LOT of frameworks do a damned fine job with serving lots of requests quickly -- I think concurrency is a far more interesting differentiator.

Glad to see that most of what I want is "on the list": 11, 12, 15, 19. Would be nice to see an additional "slow clients" test with large content -- where the limit is how fast the clients can receive server data... meaning, the limit on the server is how many clients they can stack up and handle concurrently.

Great! Please feel free to join in the discussion about future test types on the GitHub issue if you want!

Based on your comment and some others, I am presently thinking we'll want to bump up the priority of adding new tests in the upcoming rounds. Tentatively, getting the caching test in is low-hanging fruit and may be next up. But the external API test is probably next after that.