Hacker News new | ask | show | jobs
by xwolfi 890 days ago
I work in a giant investment bank with hundreds of people who can answer across all time zones. We still f up, we still don't always know where problems lie and we still sometimes can spend hours on a simple DoS.

You'll only get better at guessing what the issue could be: an exploit by a user is something you'll remember forever and will overly protect against from now on, until you hit some other completely different problem which your metric will be unprepared for, and you'll fumble around and do another post mortem promising to look at that new class of issues etc.

You'll marvel at the diversity of potential issues, especially in human-facing services like yours. But you'll probably have another long loss of service again one day, and you're right to insist on the transparency / speed of signaling to your users: they can forgive everything as long as you give them an early signal, a discount and an apology, in my experience.