|
|
|
|
|
by parth21shah
217 days ago
|
|
Right now I’m sticking to process lifecycle (sched_process_fork and sched_process_exit), mostly for correlation.
It’s much easier to grab container ID / cgroup metadata at fork time and say “this pod/image is the bad actor” than it is to reconstruct that context off a firehose of sched_switch events.
I agree that run queue latency / scheduler stats are the “better” signals for pure performance debugging. But scheduler switches generate a huge volume of events compared to forks.
So I’m starting with fork/exec/exit + container/cgroup mapping
If you’ve shipped scheduler-level tracing in production I’d love to hear how you handled filtering + aggregation. |
|