|
|
|
|
|
by jwestbury
901 days ago
|
|
One of the principal engineers I used to work with at AWS had a saying: "A one-year certificate expiration is an outage you schedule a year in advance." Of course, it's a bit hyperbolic -- but a ten-year expiration is almost a certainty to result in an outage. In a similar vein, you should never generate resources which will expire unless some undocumented action is taken. A common one I've seen is self-signed certs which last for n days, and are re-generated whenever an application is deployed or restarted, under the assumption that the application will never run untouched longer than that. (Spoiler: It probably will, at some point, whether due to unexpected change freezes, going into maintenance mode, or -- in my personal favourite -- being deployed to an environment that just isn't updated as regularly.) |
|
A year long expiry isn't frequent enough that you build automation, and is long enough that the runbook you have is likely out of date before the next time you execute it. If you make it 3 monthly, it's more likely to be fully or mostly automated, and it's more likely you'll remember that certs were recently introduced in a particular service. If you make it monthly, it's pretty much guaranteed that it'll be fully automated.
Almost every week in the weekly AWS-wide ops meetings, one service or another would be talking about something that went wrong that was caused by some certificate expiring, that happened in a place they'd forgotten they had certificates, or had missed when they did the rotation. A number of those failures presented in particularly misleading ways, too, by nature of what role the cert was playing.