Hacker News new | ask | show | jobs
by samcodes 2334 days ago
First off, I totally agree. Not as easy as it should be to write an async web server in python. FastAPI is probably your best bet. I usually use Sanic. Easy to accidentally block though.

That said, it sounds like you’re serving a large model. No amount of async/await or goroutines can solve this problem. A non-blocking web server is a godsend for I/O-bound tasks, but a large model is just a deep call stack - lots of multiply, nonlinear function like RELu, then add, times a billion. This would still block, even if you had perfect async/await code.

I made some assumptions here, but if I’m right, the answer is “shrink your model” and/or “buy more compute”. Neither of which are easy. But if you’re trying to shrink a model, check out Distiller https://github.com/NervanaSystems/distiller

Edit: the restriction I talk about is for event-loop based servers using something like uvloop or asyncio under the hood. Maybe this restriction doesn’t hold for other concurrency modes.

3 comments

> No amount of async/await or goroutines can solve this problem. A non-blocking web server is a godsend for I/O-bound tasks

In the past, we were told that threads were cheap and to use them heavily, especially to achieve parallelism. Now with the advent of async models, we're being told that threads are expensive, and often that a single processor/thread async model is better than a multi-threaded blocking one.

I'm not a luddite, I do agree that async is often better. But I wonder how we got tricked into thinking more and more threads were the answer and how we avoid such trickery again.

Wouldn't that depend on the platform entirely? On some platforms VM threads can be very inexpensive and are simply the right model for concurrency.
A little off-topic, but I've built a small backend server in Node.js and Tensorflow.js to run a previously Pyhton-built model and was amazed by how performant and non blocking it runs.

The model can do around 10k predictions/s and does it with async, which allows Node to respond to web requests in the meanwhile.

I guess it's a matter of using the right tool for the task, whenever possible, Python for data science, Nodejs for a web backend.

> No amount of async/await or goroutines can solve this problem

Presumably he’s not serving the model but running it, which is cpu bound, in which case Goroutines would solve the problem.

async would also solve his problem in Python. So there's Sanic, Quart, etc.
Async doesn’t help with CPU-bound tasks, but yes, you can rig up systems that involve running multiple Python processes behind a load balancer. It’s just more work.