Hacker News new | ask | show | jobs
by pjgalbraith 1601 days ago
You would still have the same bottleneck but the API request would return straight away with some sort of correllation ID. Then the workers that handle the GPU bound tasks would pull jobs when they are ready. If you get a lot of jobs all that will happen is the queue will fill up and the clients will wait longer and hit the status endpoint a few more times.

Here is an example of what it could look like: https://docs.microsoft.com/en-us/azure/architecture/patterns...

1 comments

Thanks for the explanation.

Right now, we use ELB (Elastic Load Balancer) to sit in front of multiple GPU instances.

Is this sufficient or do you suggest adding Celery into this architecture?