Hacker News new | ask | show | jobs
by hbrundage 975 days ago
Makes sense!

I was just talking with a Temporal solutions engineer this week and this metric is their recommended one for autoscaling on. Instead of autoscaling on queue depth, you scale on queue latency! Specifically for them they split up the time from enqueue to start, and then the time from start to done, and you scale on the former, not the total ("ScheduleToStart" in their terms).

3 comments

Time from enqueue to start isn't a good metric - it completely disregards the queue size. Enqueuing 1M jobs won't change this metric as it only updates once the job reaches the front of the queue, and when the 1Mth job does that the situation is already over.

I had much better results with a metric that shows estimated queue time for jobs that are getting enqueued right now (queue_size * running_avg_job_processing_time / parallelism).

>>> I was just talking with a Temporal solutions engineer

Aha! Just as the second season of Loki dropped. Makes sense now

Less sarcastically - this ties in with the article i guess. runat time is the enqueue, and then you are arguing for two latencies - time enqueue to start and start to complete.

Exactly! Other queue metrics have too many false-positives and false-negatives.