Hacker News new | ask | show | jobs
by mholt 1610 days ago
Caddy does. https://community.letsencrypt.org/t/questions-about-renewing...

And this is one reason why I keep advocating for certificate automation to be built into services/apps, rather than patched on the outside with duck tape.

I look forward to the day when cert lifetimes are regularly about as short as OCSP responses. Then we can possibly do away with OCSP entirely.*

(* I am of the opinion that revocation is fundamentally broken for Web PKI and it should be phased out in favor of short cert lifetimes. You may disagree and that's fine, but I'm happy to discuss why if you're interested.)

4 comments

RE: revocation broken

Absolutely. Especially with the advent of protocols like ACME it just makes sense.

Here's a nice blogpost from smallstep (not affiliated) on this topic: https://smallstep.com/blog/passive-revocation/

Certificate expiries are set primarily due to financial interests that have nothing to do with security.

Why do you think the maximum lifetime was reduced from two years to one?

Does it make a lick of difference if you’re man-in-the-middled for just one year instead of two? What kind of argument is that!?

“Oh, they got every active user credential and form that was submitted ages ago, but no worries! This won’t go on for another year! Just months to go now…”

No, obviously the CA cartel just wanted to extract 2x the rent.

The whole thing is just absurd on its face and needs to stop, but there are billions of dollars worth of rent seekers that say…

“No.”

Certificates have always been sold and priced per year. I highly doubt lifetime changes benefitted certificate authorities, and I’m pretty sure they had it forced upon them by browsers. If anything, it prevents them from collecting 2-3 years of revenue upfront.
The reduction to 27 months was voted through CA/B (where either CAs or Browsers can effectively veto, like the way Northern Ireland is governed) but only after Ryan suggested Google might just unilaterally impose 90 days if the CAs rejected a reduction.

The reduction to 398 days was imposed by Apple, unilaterally, although in practice the ecosystem went along with it. It actually took a few weeks to get clarity on exactly what Apple intended, they just basically blurted it out at a meeting.

"although in practice the ecosystem went along with it" - not that there was much choice, but some CAs were less surprised and grumbled less than others...
You might want to double-check that. The CAs (all but two, basically) disagreed with the lifetime reduction and actively voted and argued against it.
That’s because the 1 year certs are too short to be a meaningful difference from LE, so they lose their selling point. So being against the shorter expiration date is just a nefarious plot to make more money, just like supporting it would be. They are so nefarious, they’re nefarious either way! It’s a Certs-22.
Better to be man-in-the-middled for:

- One year instead of two? Yep

- 3 months instead of 1 year? Yep.

- 1 week instead of 3 months? Yep.

The reason certificates have traditionally been so long is because it was a manual process. Using ACME it is possible to expire certificates every hour if you wanted to do that.

In the past, revocation was supposed to help cases where the owner of the certificate exposed the private key in one way or another.

Now it seems that revocation is supposed to help the CA covering up mistakes made by the CA.

Maybe we actually need a better CA.

The Baseline Requirements require revocation of misissued certificates, this isn't "a CA covering up mistakes."
> Maybe we actually need a better CA.

Go for it. Start one and tell us how it went.

If ever I have too much free time, I'll spend it modifying firefox to support DANE.
I simply think your previous argument is disingenuous. We have a free to use CA who's code can be vetted, such mistakes can be caught, potential problems can be averted. If this is the price to pay, okay, so be it. Imagine what must fly under the radar of other CAs who do not have thousands of eyes vetting their code base - as in, those would never be visible.

So okay, maybe you don't have certs revoked and you don't need to restart your Traefik but are you really sure everything is okay?

The way LE and others keep breaking this process and the tools around it is certainly not a great endorsement for having it integrated into a service.
Can you give some example of the kind of breakage your experienced?
Not OP, but here are some things I've personally experienced:

1. Supposedly more secure challenge types such as TLS-ALPN-01 are far from stable, as the current incident shows. Your cert can be revoked at any time through no fault of your own. After being burned by TLS-SNI-01 the last time, now I refuse to use anything other than plain old HTTP-01 and DNS-01.

2. As soon as the version of the Linux distro I was using (not in my power to change!) reached EOL, certbot suddenly refused to renew, despite the fact that I'd been using more or less the same version of Python and certbot for a number of years and the HTTP-01 challenge requires nothing fancy at all. Why does everyone these days insist on making ops decisions for other people?

3. On a server with existing nginx virtual hosts, certbot injects configuration directives including stuff the nginx team officially recommends against, such as `if` statements. It frequently breaks existing configuration such as rewrites and redirects. After seeing this a number of times, the only conclusion I can make is that certbot has no idea how to manipulate nginx config files.

4. If I have multiple domains pointing at the same application, and remove one of them at a later time, certbot is oblivious and repeatedly fails trying to renew the certificate that now contains an invalid domain. Again, certbot doesn't know how to work with nginx.

Maybe 3 and 4 can be improved if ACME was integrated as a proper nginx module instead of certbot trying to change things from the outside. My experience as a whole, however, makes me feel that the LE/certbot teams are rather cavalier about the commitment to stability they need to make if they really want to become an essential part of the world's internet infrastructure. If you want to be paternalistic about managing TLS for people who don't know how to do it, at least try to do it properly!

certbot is just one of many, many ACME clients and libraries now available.

If you don't like how the nginx plugin works, then fork it.

Wrapping "certbot renew && nginx -s HUP" into a systemd service doesn't seem to be a very complex thing to do.

That's very close to what I'm doing, except I now refuse to touch certbot with a 10-foot pole. Plenty of better, do-one-thing-well, non-paternalistic ACME clients out there as you said.

The fact that there are alternatives, though, doesn't mean that the crappy "official" client isn't doing the LE ecosystem a disservice.

I tried solving this a different way for my selfhosted services.

Instead of running certbot on every server, I wrote a custom ACME client that runs on a master server and is responsible for requesting/renewing all certificates that I use. It also automatically deploys each cert to the correct server.

It is a single point of failure but it makes tracking certificate expiry, renewal and revocation so much easier.

Sounds like a fun project. Assuming you're talking about web servers, did you know Caddy can do that? Simply configure each one to use the same storage backend and Caddy will automatically coordinate management as a cluster, and share the certificate (and OCSP staple) resources.

(And depending on the storage backend, it's no longer a single point of failure. And even if storage is the failure, it's just storage, if it's down your servers will keep running.)

Huh that's actually quite interesting, I've never really looked into caddy as nginx has fulfilled most of my needs so far but I suppose it's about time I read up on it.
I wrote monitoring that was able to check all of my servers, all of my certificates, and alert me if certbot failed on any of them and their certificates were near expiry.

I call it "The Prometheus monitoring I already needed to make sure my servers are up and serving the websites they're supposed to"

That's what I pondered doing. Do you have any code you can open source? Thanks.
I'm not sure I want to publish it at the moment as I'm not satisfied with the quality of the codebase. It was just something I hacked together in 2 days lol.

The bulk of the work was already done for me as there was already an ACME library for my language.

I'll probably open source it once I have a chance to clean everything up.

Published code provides more value than unpublished. Not cleaned up is most academic research code, for which there is no time to clean up.

And that’s okay.