Hacker News new | ask | show | jobs
by christophilus 124 days ago
What’s shocking about it? Seems like the usual culprit— a bad config rollout. Took a long time to identify, so maybe that’s shocking. But I can attest that sometimes, you get into fight or flight mode and miss the obvious when trying to diagnose a disruption like this.

That said, nowadays, the first thing I do is spawn an agent to look through the most recent commits and try to identify something that could be the cause of a service outage.

This one seems like something Claude Code or Codex would have quickly flagged.

1 comments

Agreed, we've all been there, but 4 hours! For a network config change. No one raised their hand and said "hey I just toggled this thing maybe we should look, I did it exactly when our entire region went had down"