Hacker News new | ask | show | jobs
by Veserv 167 days ago
Of course that sucks. Just enable full time-travel recording in production and then you can use a standard multi-program trace visualizer and time travel debugger to identify the exact execution down to the instruction and precisely identify root causes in the code.

Everything is then instrumented automatically and exhaustively analyzable using standard tools. At most you might need to add in some manual instrumentation to indicate semantic layers, but even that can frequently be done after the fact with automated search and annotation on the full (instruction-level) recording.

1 comments

You're not the first person I've met that has articulated an idea like this. It sounds amazing. Do you have an idea about why this approach isn't broadly popular?
cost and compliance are non-trivial for non-trivial applications. Universal instrumentation and recording creates a meaningful fixed cost for every transaction, and you must record ~every transaction; you can't sample & retain post-hoc. If you're processing many thousands of TPS on many thousands of nodes that quickly adds up to a very significant aggregate cost even if the individual cost is small.

For compliance (or contractual agreement) there are limitations on data collection, retention, transfer, and access. I certainly don't want private keys, credentials, or payment instruments inadvertently retained. I dont want confidential material to be distributed out of band or in an uncontrolled manner (like your dev laptop). I probably don't even want employees to be able to _see_ "customer data." Which runs head long in to a bunch challenges where low level trace/sampling/profiling tools have more less open access to record and retain arbitrary bytes.

Edit: Im a big fan of continuous and pervasive observability and tracing data. Enable and retain that at ~debug level and filter + join post-hoc as needed. My skepticism above is about continuous profiling and recording (ala vtune/perf/ebpf), which is where "you" need to be cognizant of risks & costs.