Hacker News new | ask | show | jobs
by james_roberts 1650 days ago
I've been developing a Python extension, written in C, that provides users with an ultra-fast WSGI server that they can run their WSGI applications on (Flask, Django, uWSGI etc).

I have also recently managed to get it working on multiple platforms (Linux, MacOS and Windows).

If you want to significantly speed up your WSGI based applications, check it out!

It is still in early development at the moment. Any feedback would be greatly appreciated!

=== [Links] ===

Github: https://github.com/jamesroberts/fastwsgi

Pypi: https://pypi.org/project/fastwsgi/

Performance comparisons against other popular WSGI servers: https://github.com/jamesroberts/fastwsgi/blob/main/performan...

6 comments

With Python, benchmarks that matter are what's the maximum performance you can get out of a particular hardware in the most optimal configuration for the tool in use. (This was most obvious with the async framework benchmarks)

Perhaps other WSGI frameworks are not achieving 100% CPU load and FastWCGI is, and you could easily run them with multiple worker processes to get the same CPU load and comparable performance? (this is just wild speculation)

I don't know, but that's the kind of benchmark I'd like to see: eg. what's the maximum performance you can get out of a, say, 4-core CPU with each of them with whatever configuration stresses the CPU completely, and what are the other metrics you might be seeing (eg. asyncio will basically net you lower memory use, but not any better performance in RPS)?

It'd also be good if the benchmark tool is not running on the same CPU though, as long as you've good a sufficiently fast interconnect.

Not to be taken badly, results you post are amazing and point at a greatly optimised request handling! But with these benchmarks you do not demonstrate the practical value over any competing tool (just that your worker threads can do more in the same time, which is more of the "good engineering" pat-on-the-back type of thing :).
Thanks for the feedback! Yeah, the benchmarks just highlight higher numbers in very simple tests. They should definitely be taken with a grain of salt. I can try add some more practical highlights.
Yeah I don't have an optimal benchmarking setup currently. I have a 16-core CPU that I run everything off (server + benchmarks = not ideal).

I've been working on adding multiprocessing to FastWSGI (only works on Linux at the moment). The RPS almost scales linearly with the number of workers. At 4 processes it hits +260k RPS with decent CPU utilization.

I can definitely add more details to the benchmarks. Thanks for the feedback!

It's more of a question how other tools need to be configured to get a similar CPU utilization, and what RPS would they hit in that case?

I'd create a VM that you can load sufficiently well with FastWSGI, but other tools might need more worker threads or worker processes added to put the same load on that VM.

Basically, what matters practically is what's the maximum you can pull out of this hardware (whatever the configuration is)? Those are then really comparable.

Thanks for the advice! I'll try this out
Ideally, it turns out that all those other tools load the same hardware just the same, but they simply suck compared to FastWSGI.

Then we can all get collectively excited :)

It would be good to eventually see some comparisons running some average/unoptimized code. IME benchmarks seem to focus on either the very basics, or select areas where they are faster than other apps. This is important to cover, but having something that's closer to a real app is more convincing, even if the performance margins drop somewhat.

To come back to the average code. I may start an app and try to optimize things as much as reasonable. Eventually, I'm only able to focus on the functionality and a slide in performance can happen. As more developers are added, this can happen more quickly in the average organization that focuses primarily on functionality. Of course, we should analyze the perf problems and improve the code, but projects like yours may offer a huge perf boost for teams that are struggling here.

I love Python, but I've always enjoyed that bit more scope that the .NET platform provided around performance when code was less optimal. If you can really speed up WSGI this much it'll be a huge boon.

Thanks for the feedback! Yeah, some more "real world" benchmarks should be added.

And yes, if you have some application that you've tried to optimize somewhat and haven't managed to get the desired performance you would like, or if you simply haven't had the time to re-write components to be faster, ideally you could use FastWSGI as a drop in replacement for your current WSGI server and get the extra perf boost for "free". It's still in early development, but this is ultimately one of the main goals of the project.

> provides users with an ultra-fast WSGI server that they can run their WSGI applications on (Flask, Django, uWSGI etc)

I thought this was an equivalent to uWSGI, what's the benefit of running uWSGI on top of FastWSGI?

Oops. This is a typo... should be WSGI apps. Not the uWSGI server
In real world performance it doesn't appear to be that much faster than Bjoern, which you also surmised that Flask is the bottleneck.

Are there other Python WSGI frameworks that can take advantage of FastWSGI?

Why is Flask as slow as it is?

Since this is C, what security mitigations have you put into place?

In theory, any framework that follows the WSGI guidelines should be able to run on top of FastWSGI and take advantage of its speed.

There are many frameworks out there. I do intend to test out some more. For now I've only tested the popular Flask framework and a simple bare bones WSGI app.

Flask was never developed to be lightning fast. Even still, I am quite surprised at how slow it is now that I've seen what kind of numbers can be achieved. I haven't looked deep into it to see where the issues might be.

As for security, that is a work in progress... I definitely wouldn't use FastWSGI in production in the projects current state. It's still early days in terms of development.

Is it to do with the router?. I was reading this article.

https://www.slideshare.net/kwatch/how-to-make-the-fastest-ro...

which lead me to this code.

https://github.com/kwatch/router-sample/blob/master/minikeig...

and i was going to consider trying to benchmark different routers. I started a repo here which drops flask and uses some random router off pypi... https://github.com/byteface/fastwsgitest/blob/master/app.py

and i think i just need to merge that state machine router into it for a test. I believe it's part of the architecture for templating engine called tenjin that predates even jinja?

found your benchmark repo but not installed wrk yet to test.

Looks great, thanks. I was going through the Performance Benchmarks and noticed that the mod_wsgi Apache module - https://github.com/GrahamDumpleton/mod_wsgi - is missing? Please consider including it in the benchmark too - would love to see how your module matches against it.
Will do! Thanks for the feedback
FYI, you have a typo in the graph `Requests served in 60 seconds`. server'd.
Thanks, nice catch! I will update that.