Hacker News new | ask | show | jobs
by stdplaceholder 2755 days ago
I know it's happened to every company where I've worked. It happens so rarely, though, that people don't have enough opportunity to learn from it. Even at Google they were on their Nth such outage for a large value of N before it became apparent that no certificate should ever expire at 23:59:59 on December 31, or otherwise outside of normal operating hours. Seriously 20 years of organizational knowledge required to get the company to understand that certs should expire at noon on a Wednesday to minimize time-to-repair in the inevitable event that one is allowed to lapse.
1 comments

Instead of waiting last minute, you'd think a large company would have planning to renew certificates X amount of time before they expire. Alas I understand it's not that simple.
In my small organization we had planning and multiple reminders to renew the cert well before it expired, and we did. Due to a miscommunication between myself and a coworker, the new cert sat ready for nearly two months without ever being added to the configuration (we were both certain the other had done it, naturally).

There's a remarkable number of ways for this simple thing to go wrong. To prevent a future repeat, we got rid of our calendar reminders (which we started ignoring once we both thought the change had been made) and wrote a script that emailed us based on the time to expiration of the live cert. This is a much better method.

Of course, give us enough years and I'm sure we'll manage to find a way to get this new setup wrong.

You'd certainly think so. If you have frontend probers that exercise your accessible endpoints (HTTP or whatever) then those probes should fail when the certificate expires in less than 30 days. I couldn't comment on whether an organization like Ericsson or O2 would be expected to have such probers.
I should get on that for my own infrastructure.

Thanks for the reminder.