| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by samcodes 2334 days ago

First off, I totally agree. Not as easy as it should be to write an async web server in python. FastAPI is probably your best bet. I usually use Sanic. Easy to accidentally block though.

That said, it sounds like you’re serving a large model. No amount of async/await or goroutines can solve this problem. A non-blocking web server is a godsend for I/O-bound tasks, but a large model is just a deep call stack - lots of multiply, nonlinear function like RELu, then add, times a billion. This would still block, even if you had perfect async/await code.

I made some assumptions here, but if I’m right, the answer is “shrink your model” and/or “buy more compute”. Neither of which are easy. But if you’re trying to shrink a model, check out Distiller https://github.com/NervanaSystems/distiller

Edit: the restriction I talk about is for event-loop based servers using something like uvloop or asyncio under the hood. Maybe this restriction doesn’t hold for other concurrency modes.

3 comments

speedplane 2334 days ago

> No amount of async/await or goroutines can solve this problem. A non-blocking web server is a godsend for I/O-bound tasks

In the past, we were told that threads were cheap and to use them heavily, especially to achieve parallelism. Now with the advent of async models, we're being told that threads are expensive, and often that a single processor/thread async model is better than a multi-threaded blocking one.

I'm not a luddite, I do agree that async is often better. But I wonder how we got tricked into thinking more and more threads were the answer and how we avoid such trickery again.

link

hopia 2334 days ago

Wouldn't that depend on the platform entirely? On some platforms VM threads can be very inexpensive and are simply the right model for concurrency.

link

ojosilva 2334 days ago

A little off-topic, but I've built a small backend server in Node.js and Tensorflow.js to run a previously Pyhton-built model and was amazed by how performant and non blocking it runs.

The model can do around 10k predictions/s and does it with async, which allows Node to respond to web requests in the meanwhile.

I guess it's a matter of using the right tool for the task, whenever possible, Python for data science, Nodejs for a web backend.

link

weberc2 2334 days ago

> No amount of async/await or goroutines can solve this problem

Presumably he’s not serving the model but running it, which is cpu bound, in which case Goroutines would solve the problem.

link

fnord123 2334 days ago

async would also solve his problem in Python. So there's Sanic, Quart, etc.

link

weberc2 2334 days ago

Async doesn’t help with CPU-bound tasks, but yes, you can rig up systems that involve running multiple Python processes behind a load balancer. It’s just more work.

link