|
|
|
|
|
by alimoeeny
2338 days ago
|
|
I had not used python for like 5 year, since when I migrated all my work to go. Recently I went back to python and was shocked to see how hard (relatively) it is to setup a (moderately) high performing web server in python.
I mean in my case I had a “data science” type application and sometimes a request would block and take a second to finish, and this meant a handful of users would bring the server to it’s knees (due to extremely high mem usage I could not have a lot of indepentend worker processes running at the same time), I wish I could call python code from within a go web server with some ease and safety. |
|
That said, it sounds like you’re serving a large model. No amount of async/await or goroutines can solve this problem. A non-blocking web server is a godsend for I/O-bound tasks, but a large model is just a deep call stack - lots of multiply, nonlinear function like RELu, then add, times a billion. This would still block, even if you had perfect async/await code.
I made some assumptions here, but if I’m right, the answer is “shrink your model” and/or “buy more compute”. Neither of which are easy. But if you’re trying to shrink a model, check out Distiller https://github.com/NervanaSystems/distiller
Edit: the restriction I talk about is for event-loop based servers using something like uvloop or asyncio under the hood. Maybe this restriction doesn’t hold for other concurrency modes.