|
|
|
|
|
by pjgalbraith
1602 days ago
|
|
Yeah this is the way. @headlessvictim2 search for "Asynchronous Request-Reply pattern" if you want more information about this kind of architecture. You will remove any bottleneck from the API server and can easily scale out from the task queue. |
|
How would this work with GPU-bound machine learning models?
The model processing takes > 30 seconds and would still represent the bottleneck?