|
|
|
|
|
by samcodes
2334 days ago
|
|
First off, I totally agree. Not as easy as it should be to write an async web server in python. FastAPI is probably your best bet. I usually use Sanic. Easy to accidentally block though. That said, it sounds like you’re serving a large model. No amount of async/await or goroutines can solve this problem. A non-blocking web server is a godsend for I/O-bound tasks, but a large model is just a deep call stack - lots of multiply, nonlinear function like RELu, then add, times a billion. This would still block, even if you had perfect async/await code. I made some assumptions here, but if I’m right, the answer is “shrink your model” and/or “buy more compute”. Neither of which are easy. But if you’re trying to shrink a model, check out Distiller https://github.com/NervanaSystems/distiller Edit: the restriction I talk about is for event-loop based servers using something like uvloop or asyncio under the hood. Maybe this restriction doesn’t hold for other concurrency modes. |
|
In the past, we were told that threads were cheap and to use them heavily, especially to achieve parallelism. Now with the advent of async models, we're being told that threads are expensive, and often that a single processor/thread async model is better than a multi-threaded blocking one.
I'm not a luddite, I do agree that async is often better. But I wonder how we got tricked into thinking more and more threads were the answer and how we avoid such trickery again.