Hacker News new | ask | show | jobs
by parth21shah 218 days ago
OP here. I’ve been doing backend work for ~15 years, but this was the first time I really felt why eBPF matters. We had a latency spike that all the usual polling tools missed — top, CloudWatch, Datadog, everything looked normal. In the end it was a misconfigured cron job spawning ~50 short-lived workers every minute. Each one ran for ~500ms, burned the CPU, and exited before the next poll. So all our “snapshot” tools were basically blind. I wrote the post to show this exact gap: Polling = snapshots, Tracing = event stream. For stuff that appears and disappears between polls, only tracing really sees it.tools like execsnoop or auditd can catch this, but in our case the overhead felt too high to leave on 24/7 in production. I amm currently playing with a small Rust+Aya agent that listens on ring buffers so we can run this continuously with less overhead. If you just want to try the idea, the post has a few bpftrace one-liners so you can reproduce the detection logic without writing any C or Rust.