Hacker News new | ask | show | jobs
by Anderkent 3006 days ago
This is a pretty cool design if your requests to a given endpoint are supposed to all take about the same time. It's not easy to see how you'd adjust it for things with more variance; perhaps rather than using the fastest seen mtt, you could look at your p99 over the last N minutes, and see if it's been changing?
2 comments

Jeah, I'm also thinking about this problem. Thought this could help...

Most of the problematic backedends are not those with response time of 20ms. On almost every request. The backedends with problems are those which could reply in 10ms or 2 minutes ...

I think you bring up the main problem w/ using tcp vegas. It's not clear to me this will work with heterogenous requests. If the typical request time distribution is long tailed, it might never increase the window size.
Even with heterogenous workload there normally is a uniform distribution of request types. Instead of generating complex statistics for average latency or tail latencies, especially for multimodal distributions, we just look at the minimum latencies as a proxy to identify queuing. So, when there is any queuing for whatever reason (increased RPS or latency in a dependent service) all latency measurements will show an increase, especially the minimum.
"uniform distribution of request types" - okay, it makes sense in that context. Although if that assumption breaks down, your thread limits may become under or over provisioned.

I'm wondering though - how do you pick the right alpha and beta values? It seems like you need to do testing/validation to ensure you use the right values, right?

Sorry if I'm sounding critical by the way. I think this is a really cool project - thanks for open sourcing it!