| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pjgalbraith 1602 days ago
	Yeah this is the way. @headlessvictim2 search for "Asynchronous Request-Reply pattern" if you want more information about this kind of architecture. You will remove any bottleneck from the API server and can easily scale out from the task queue.

1 comments

headlessvictim2 1602 days ago

Thanks for the suggestion.

How would this work with GPU-bound machine learning models?

The model processing takes > 30 seconds and would still represent the bottleneck?

link

pjgalbraith 1602 days ago

You would still have the same bottleneck but the API request would return straight away with some sort of correllation ID. Then the workers that handle the GPU bound tasks would pull jobs when they are ready. If you get a lot of jobs all that will happen is the queue will fill up and the clients will wait longer and hit the status endpoint a few more times.

Here is an example of what it could look like: https://docs.microsoft.com/en-us/azure/architecture/patterns...

link

headlessvictim2 1602 days ago

Thanks for the explanation.

Right now, we use ELB (Elastic Load Balancer) to sit in front of multiple GPU instances.

Is this sufficient or do you suggest adding Celery into this architecture?

link