Hacker News new | ask | show | jobs
by elandau 3006 days ago
Even with heterogenous workload there normally is a uniform distribution of request types. Instead of generating complex statistics for average latency or tail latencies, especially for multimodal distributions, we just look at the minimum latencies as a proxy to identify queuing. So, when there is any queuing for whatever reason (increased RPS or latency in a dependent service) all latency measurements will show an increase, especially the minimum.
1 comments

"uniform distribution of request types" - okay, it makes sense in that context. Although if that assumption breaks down, your thread limits may become under or over provisioned.

I'm wondering though - how do you pick the right alpha and beta values? It seems like you need to do testing/validation to ensure you use the right values, right?

Sorry if I'm sounding critical by the way. I think this is a really cool project - thanks for open sourcing it!