|
|
|
|
|
by Nextgrid
1601 days ago
|
|
Any chance the entire thing can be offloaded to a task queue (Celery/etc)? This would decouple the HTTP request processing from the actual ML task. The memory errors you're seeing could suggest that you may not actually be able to run multiple instances of the model, and even if you could it may not actually give you more performance than processing sequentially. Seems like ultimately your current design can't gracefully handle too many concurrent requests, legitimate or malicious - this is a problem I recommend you address regardless of whether you manage to ban the malicious users. |
|
@headlessvictim2 search for "Asynchronous Request-Reply pattern" if you want more information about this kind of architecture. You will remove any bottleneck from the API server and can easily scale out from the task queue.