Hacker News new | ask | show | jobs
by scott_w 346 days ago
I don’t know of any real posts on it, it just ends up being kind of a “assume it’ll go wrong,” then figure out how you know something has gone wrong and track it down. Your starting point is, after an issue is reported, add a load of logs around places that seem like candidates for the flow. Over time, you get a sense of where things can break and you add that telemetry ahead of time.