|
|
|
|
|
by deredede
973 days ago
|
|
Even if you get lots of Bs, if B takes 15s to run, and you ever get n_worker requests for B at the same time immediately followed by a single A, you lose, even with plenty capacity to spare. You need either dedicated workers for low latency tasks or some sort of preemption to meet SLOs with such heterogeneous tasks. |
|