|
Well, having more spare ruby processes / threads would make the app more resistant to latency variability, and could have made some incidents into nonevents. Also, while I don't disagree that it is indeed a hard problem, I do have very good experience with an async java stack, where I didn't have to worry about things like this. As long as a sane queue limit is defined on let's say the jetty http client, if something bad happens at the other end, the back pressure would kick in by failing immediately the requests that couldn't make it into the queue. Other parts of the app would then continue to be functional. So, I would contend that it has a lot to do with ruby high memory usage, made much worse when single-threaded, and it looks like ruby 3.0 still won't have a complete async story yet? EDIT: I checked the link again, and it looks Jeff Dean was talking about latency at p999 or above? By "hiccup", I actually mean something that would increase avg latency by perhaps 5~10x times, e.g. avg latency of 100ms under steady state + timeout of 1 second + the remote being down. Sorry for the confusion. Here, I am lucky if people start caring about p95. |
Maybe you're thinking of the new Actor based model for compute parallelism? Async IO in production Ruby has been a thing for easily more than a decade.