Hacker News new | ask | show | jobs
by deredede 973 days ago
Even if you get lots of Bs, if B takes 15s to run, and you ever get n_worker requests for B at the same time immediately followed by a single A, you lose, even with plenty capacity to spare.

You need either dedicated workers for low latency tasks or some sort of preemption to meet SLOs with such heterogeneous tasks.

1 comments

Right you need to think about the queue wait time you are willing to tolerate, in addition the time it takes a job to run. If you arent willing to wait in queue then you will need to have idle capacity.