Hacker News new | ask | show | jobs
by jdesfossez 3523 days ago
It would be worthwhile to clarify the term "tracing" to distinguish between live aggregation and post-processing approaches. The general confusion around the "tracing" terminology seems to imply a competition between these two, while they should rather be seen as complementary.

DTrace, SystemTap and eBPF/BCC are designed to aggregate data in the critical path and compute a summary of the activity. Ftrace and LTTng are designed to extract traces of execution for high resolution post-processing with as small overhead as possible.

Aggregation is very powerful and gives a quick overview of the current activity of the system. Tracing extracts the detailed activity at various levels and allows in-depth understanding of a particular behaviour after the fact by allowing to run as many analyses as necessary on the captured trace.

In terms of impact on the traced system, trace buffering scales better with the number of cores than aggregation approaches due its ability to partition the trace data into per-core buffers.

Both approaches have upsides and downsides and should not be seen as being in competition, they address different use-cases and can even complement each other.

1 comments

You're right that a key feature and differentiator of DTrace/stap/BPF is kernel aggregations, but they can do per-event output as well. But I think I know what you mean, especially as I was at the sysdig summit yesterday and could see a major difference.

I think the two models for tracers, playing on their strengths, are: 1. real-time analysis tracers (DTrace/stap/BPF), and 2. offline analysis tracers (LTTng, sysdig). Both can do the other as well, but I'm just pointing out strengths.

sysdig (and I believe LTTng) has done great work at creating capture files that can then be analyzed offline in many many different ways, and they've optimized the way full-event dumps can be captured and saved (which I know LTTng has done as well). DTrace/stap/BPF don't have any offline capture file capabilities -- they could do it, but it's not been their focus.