| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by knweiss 3697 days ago
	Brendan, did you read this awesome blog post (which mentions your work btw) about Google's tracing framework which may explain the kind of problems they want to solve? http://danluu.com/perf-tracing/ "Sampling profilers, the most common performance debugging tool, are notoriously bad at debugging problems caused by tail latency because they aggregate events into averages. But tail latency is, by definition, not average."

1 comments

brendangregg 3697 days ago

I don't quite get it -- to me, that's like saying a hammer is notoriously bad at screwing in screws. And then going ahead and proving it. Um. Good content, but I don't get the premise. I wasn't using sampling profilers to study tail latency in the first place. Who would? (Maybe it's a different usage of "sampling profilers" than I'm familiar with.)

As for tail latency: one way is to dump every function event ; a lower cost way is to summarize latency in-kernel as a histogram. For both you're still tracing every function, and care about overhead down to 100 cycles etc.

But think about what causes the tails in the first place. Lock contention? Trace that. Resource I/O? Trace that. If it costs 1 us to trace disk I/O, it's usually not a problem. I like to time scheduler switch events with stack traces -- a catch all (but a bit more expensive). Of course, these approaches require kernel instrumentation. :)

compudj 3696 days ago

Hi Brendan,

(Full disclosure: I'm Mathieu Desnoyers, part of the LTTng maintainer team.)

I would like to introduce a slightly less extreme point of view when considering "on-the-fly" aggregation of traces vs tracing to buffer followed by post-processing. I see from the current discussion thread that it's very much either one or the other, but I think that combining the two approach helps creating much more powerful tools. On-the-fly aggregation based on trace instrumentation helps pinpointing latency outliers. Tracing to buffers, on the other hand, provides very detailed information about the system behavior that leads to those outliers. By using on-the-fly aggregation as "triggers" to collect tracer in-memory ring-buffers, one can achieve investigation of latency outliers with very small I/O overhead.