|
|
|
|
|
by evan_miller
4277 days ago
|
|
Ah, thanks for your feedback. I'm never sure how much detail I should dive into. I will add more next time. I'll answer them here just for practice :) Recall, this is a problem in the 99th percentile. That is, on average our requests are still taking around 20ms to complete -- but one in every hundred or so takes about 200ms. The first strace output shows a summary table of time, including both our fast 20ms queries and the 1% of slow 200ms queries. In the summary view I am looking not simply at the largest number in the summary but rather indications that my problematic system is spending time doing something that my healthy system is not. Time spent is a zero-sum equation -- the app is either doing what it's supposed to, or not. futex() is where it should be spending time while idle. fsync() time doubles proportionally, which means it's taking time away from the otherwise productive (or, idle) operations. Which means it's potentially the problem. When I trace the fsync call individually I see a time delta roughly on par with my per-request delay for the slow 1% of queries. Which indicates to me that this call is happening during the slow queries, accounting for the delay in its entirety. |
|