Hacker News new | ask | show | jobs
by necovek 1650 days ago
With Python, benchmarks that matter are what's the maximum performance you can get out of a particular hardware in the most optimal configuration for the tool in use. (This was most obvious with the async framework benchmarks)

Perhaps other WSGI frameworks are not achieving 100% CPU load and FastWCGI is, and you could easily run them with multiple worker processes to get the same CPU load and comparable performance? (this is just wild speculation)

I don't know, but that's the kind of benchmark I'd like to see: eg. what's the maximum performance you can get out of a, say, 4-core CPU with each of them with whatever configuration stresses the CPU completely, and what are the other metrics you might be seeing (eg. asyncio will basically net you lower memory use, but not any better performance in RPS)?

It'd also be good if the benchmark tool is not running on the same CPU though, as long as you've good a sufficiently fast interconnect.

2 comments

Not to be taken badly, results you post are amazing and point at a greatly optimised request handling! But with these benchmarks you do not demonstrate the practical value over any competing tool (just that your worker threads can do more in the same time, which is more of the "good engineering" pat-on-the-back type of thing :).
Thanks for the feedback! Yeah, the benchmarks just highlight higher numbers in very simple tests. They should definitely be taken with a grain of salt. I can try add some more practical highlights.
Yeah I don't have an optimal benchmarking setup currently. I have a 16-core CPU that I run everything off (server + benchmarks = not ideal).

I've been working on adding multiprocessing to FastWSGI (only works on Linux at the moment). The RPS almost scales linearly with the number of workers. At 4 processes it hits +260k RPS with decent CPU utilization.

I can definitely add more details to the benchmarks. Thanks for the feedback!

It's more of a question how other tools need to be configured to get a similar CPU utilization, and what RPS would they hit in that case?

I'd create a VM that you can load sufficiently well with FastWSGI, but other tools might need more worker threads or worker processes added to put the same load on that VM.

Basically, what matters practically is what's the maximum you can pull out of this hardware (whatever the configuration is)? Those are then really comparable.

Thanks for the advice! I'll try this out
Ideally, it turns out that all those other tools load the same hardware just the same, but they simply suck compared to FastWSGI.

Then we can all get collectively excited :)