Hacker News new | ask | show | jobs
by alimoeeny 2338 days ago
I had not used python for like 5 year, since when I migrated all my work to go. Recently I went back to python and was shocked to see how hard (relatively) it is to setup a (moderately) high performing web server in python. I mean in my case I had a “data science” type application and sometimes a request would block and take a second to finish, and this meant a handful of users would bring the server to it’s knees (due to extremely high mem usage I could not have a lot of indepentend worker processes running at the same time),

I wish I could call python code from within a go web server with some ease and safety.

6 comments

First off, I totally agree. Not as easy as it should be to write an async web server in python. FastAPI is probably your best bet. I usually use Sanic. Easy to accidentally block though.

That said, it sounds like you’re serving a large model. No amount of async/await or goroutines can solve this problem. A non-blocking web server is a godsend for I/O-bound tasks, but a large model is just a deep call stack - lots of multiply, nonlinear function like RELu, then add, times a billion. This would still block, even if you had perfect async/await code.

I made some assumptions here, but if I’m right, the answer is “shrink your model” and/or “buy more compute”. Neither of which are easy. But if you’re trying to shrink a model, check out Distiller https://github.com/NervanaSystems/distiller

Edit: the restriction I talk about is for event-loop based servers using something like uvloop or asyncio under the hood. Maybe this restriction doesn’t hold for other concurrency modes.

> No amount of async/await or goroutines can solve this problem. A non-blocking web server is a godsend for I/O-bound tasks

In the past, we were told that threads were cheap and to use them heavily, especially to achieve parallelism. Now with the advent of async models, we're being told that threads are expensive, and often that a single processor/thread async model is better than a multi-threaded blocking one.

I'm not a luddite, I do agree that async is often better. But I wonder how we got tricked into thinking more and more threads were the answer and how we avoid such trickery again.

Wouldn't that depend on the platform entirely? On some platforms VM threads can be very inexpensive and are simply the right model for concurrency.
A little off-topic, but I've built a small backend server in Node.js and Tensorflow.js to run a previously Pyhton-built model and was amazed by how performant and non blocking it runs.

The model can do around 10k predictions/s and does it with async, which allows Node to respond to web requests in the meanwhile.

I guess it's a matter of using the right tool for the task, whenever possible, Python for data science, Nodejs for a web backend.

> No amount of async/await or goroutines can solve this problem

Presumably he’s not serving the model but running it, which is cpu bound, in which case Goroutines would solve the problem.

async would also solve his problem in Python. So there's Sanic, Quart, etc.
Async doesn’t help with CPU-bound tasks, but yes, you can rig up systems that involve running multiple Python processes behind a load balancer. It’s just more work.
> I mean in my case I had a “data science” type application and sometimes a request would block and take a second to finish, and this meant a handful of users would bring the server to it’s knees

This sounds like an issue with the design of your program, not Python.

No, data science is typically cpu expensive. Python is fundamentally single threaded and slow at that, so you have to be very clever to work around the issue (e.g., running a separately scalable service for your data crunching work). Contrast that with Go where the runtime can use other cores.
>Python is fundamentally single threaded and slow

Python is not fundamentally single threaded - it just has a lock that stops it from taking advantage of threads in cpu bound scenarios.

Python is used in data science because of the C bindings that make it not slow. Also, when in C, you can take advantage of threads since they live outside the GIL. e.g. Dask.

> Python is not fundamentally single threaded - it just has a lock that stops it from taking advantage of threads in cpu bound scenarios.

Tomato tomahto

> Python is used in data science because of the C bindings that make it not slow. Also, when in C, you can take advantage of threads since they live outside the GIL. e.g. Dask.

Correct. Python is fast when you aren’t running Python. Of course using C (or anything else) only works in certain situations—there is a cost to crossing the language boundary and very often that cost is greater than what you save by using C. Never mind the added build/package complexity, the security issues, the maintainability issues, etc.

Python is a neat language, but it’s really expensive if your project ever might have tight performance requirements (where “tight” is laughably easy for most other languages). Python can often be made to meet them with enough shenanigans, it’s just costly to implement and maintain said shenanigans.

> Python is not fundamentally single threaded

Yes it is. It is not designed to run fast on multi-core CPUs, because there were mostly single-core CPUs when Guido made the language. It has been always a problem since multi-core CPUs are more frequent and it's a front where Python is losing the battle (against Go for example because of way easier concurrency support).

A program that uses subprocess would not have the single core constraint
No, but very often those programs are slower overall because of the pickling cost. Multiprocessing isn’t a magic bullet / there’s a reason threads exist.
Case in point.
If the language makes you jump through unnecessary hoops to get passable performance, the issue is with the language, not the program. Otherwise we could generalize your perspective such that all languages are above criticism (no languages have problems, users just fail to find and implement the proper hacks).
Wouldn't a job queue solve both issues? You could either make the data science stuff async (and spin up more servers as necessary). Or, with a job queue, you could use a go webserver to push requests into the queue for processing in python.
Your claims are without merit unless you post details about your setup and what you were trying to do.

Especially considering that there's quite a few sites which run entirely on Python.

This claim is specifically related to blocking webserver in python vs concurrent webservers in go. I assumed this is common knowledge, that scaling python webservers is a complicated thing, while in go you get a concurrent web server in standard library,
Were you deploying using a Python webserver (e.g. SimpleHTTPServer), or was it nginx routed to an app server?
And this is the trouble with the OP imo. While its true, I find it an unfair comparison to make.
How about comparing a blocking server in Go versus the same in Python, then an async server in Python vs the concurrent Go server? That would be a more apt comparison.
That you can run a site entirely on Python does not mean it's efficient to do so.
Efficiency is measured in a lot of different ways. Development speed? Team familiarity? Existing infrastructure (private pypi etc...)? Existing libraries?

If we go with what I would assume is your definition -- speed of execution versus resources used -- it is certainly possible to build fast Python applications that are efficient.

But use what makes you and your team happy, life is too short for anything else.

blocking webservers are a common problem in Python. Even if you have async, all it takes is a CPU bound task or some sync IO deep in some third party library and its lights out for your server, and no debug info for you either.
We run python at work and I have to agree with this however with Python 3 and ASGI servers now existing this should be significantly less of an issue. Even just Python 3 WSGI using Gunicorn + gevent should be fairly performant for Python.

There is no way around the high memory usage, but a large number of the problems with Python concurrency is not loading nginx (or load balancer) in front and not switching to gevent from PreFork which uses a considerably higher amount of memory per “node” for higher concurrency. That said, gevent is only "performant" if what you’re doing is IO bound. Same thing with any AsyncIO based server.

"data science" sounds like DB or NoSQL heavy so should fit this case, but of course all of this is just general advice and depends on the app/code like others said.

Do try out Fastapi. Its creator now works with explosion.ai and works on production usecases of deploying models with it.

It ranks pretty high on performance in framework benchmarks