Hacker News new | ask | show | jobs
by brendangregg 3697 days ago
I don't quite get it -- to me, that's like saying a hammer is notoriously bad at screwing in screws. And then going ahead and proving it. Um. Good content, but I don't get the premise. I wasn't using sampling profilers to study tail latency in the first place. Who would? (Maybe it's a different usage of "sampling profilers" than I'm familiar with.)

As for tail latency: one way is to dump every function event ; a lower cost way is to summarize latency in-kernel as a histogram. For both you're still tracing every function, and care about overhead down to 100 cycles etc.

But think about what causes the tails in the first place. Lock contention? Trace that. Resource I/O? Trace that. If it costs 1 us to trace disk I/O, it's usually not a problem. I like to time scheduler switch events with stack traces -- a catch all (but a bit more expensive). Of course, these approaches require kernel instrumentation. :)

1 comments

Hi Brendan,

(Full disclosure: I'm Mathieu Desnoyers, part of the LTTng maintainer team.)

I would like to introduce a slightly less extreme point of view when considering "on-the-fly" aggregation of traces vs tracing to buffer followed by post-processing. I see from the current discussion thread that it's very much either one or the other, but I think that combining the two approach helps creating much more powerful tools. On-the-fly aggregation based on trace instrumentation helps pinpointing latency outliers. Tracing to buffers, on the other hand, provides very detailed information about the system behavior that leads to those outliers. By using on-the-fly aggregation as "triggers" to collect tracer in-memory ring-buffers, one can achieve investigation of latency outliers with very small I/O overhead.