Hacker News new | ask | show | jobs
by mastax 1607 days ago
Wonder how many ACME deployments check for revocation, rather than just being on an infrequent cron job? What proportion of affected certificates will be automatically renewed with no effort?

Looking at a few docs, probably not many. In any case there isn't (?) an in-band way to tell the clients that the cert is going to be revoked before it is revoked, so there would be some disruption.

4 comments

Caddy does. https://community.letsencrypt.org/t/questions-about-renewing...

And this is one reason why I keep advocating for certificate automation to be built into services/apps, rather than patched on the outside with duck tape.

I look forward to the day when cert lifetimes are regularly about as short as OCSP responses. Then we can possibly do away with OCSP entirely.*

(* I am of the opinion that revocation is fundamentally broken for Web PKI and it should be phased out in favor of short cert lifetimes. You may disagree and that's fine, but I'm happy to discuss why if you're interested.)

RE: revocation broken

Absolutely. Especially with the advent of protocols like ACME it just makes sense.

Here's a nice blogpost from smallstep (not affiliated) on this topic: https://smallstep.com/blog/passive-revocation/

Certificate expiries are set primarily due to financial interests that have nothing to do with security.

Why do you think the maximum lifetime was reduced from two years to one?

Does it make a lick of difference if you’re man-in-the-middled for just one year instead of two? What kind of argument is that!?

“Oh, they got every active user credential and form that was submitted ages ago, but no worries! This won’t go on for another year! Just months to go now…”

No, obviously the CA cartel just wanted to extract 2x the rent.

The whole thing is just absurd on its face and needs to stop, but there are billions of dollars worth of rent seekers that say…

“No.”

Certificates have always been sold and priced per year. I highly doubt lifetime changes benefitted certificate authorities, and I’m pretty sure they had it forced upon them by browsers. If anything, it prevents them from collecting 2-3 years of revenue upfront.
The reduction to 27 months was voted through CA/B (where either CAs or Browsers can effectively veto, like the way Northern Ireland is governed) but only after Ryan suggested Google might just unilaterally impose 90 days if the CAs rejected a reduction.

The reduction to 398 days was imposed by Apple, unilaterally, although in practice the ecosystem went along with it. It actually took a few weeks to get clarity on exactly what Apple intended, they just basically blurted it out at a meeting.

"although in practice the ecosystem went along with it" - not that there was much choice, but some CAs were less surprised and grumbled less than others...
You might want to double-check that. The CAs (all but two, basically) disagreed with the lifetime reduction and actively voted and argued against it.
That’s because the 1 year certs are too short to be a meaningful difference from LE, so they lose their selling point. So being against the shorter expiration date is just a nefarious plot to make more money, just like supporting it would be. They are so nefarious, they’re nefarious either way! It’s a Certs-22.
Better to be man-in-the-middled for:

- One year instead of two? Yep

- 3 months instead of 1 year? Yep.

- 1 week instead of 3 months? Yep.

The reason certificates have traditionally been so long is because it was a manual process. Using ACME it is possible to expire certificates every hour if you wanted to do that.

In the past, revocation was supposed to help cases where the owner of the certificate exposed the private key in one way or another.

Now it seems that revocation is supposed to help the CA covering up mistakes made by the CA.

Maybe we actually need a better CA.

The Baseline Requirements require revocation of misissued certificates, this isn't "a CA covering up mistakes."
> Maybe we actually need a better CA.

Go for it. Start one and tell us how it went.

If ever I have too much free time, I'll spend it modifying firefox to support DANE.
The way LE and others keep breaking this process and the tools around it is certainly not a great endorsement for having it integrated into a service.
Can you give some example of the kind of breakage your experienced?
Not OP, but here are some things I've personally experienced:

1. Supposedly more secure challenge types such as TLS-ALPN-01 are far from stable, as the current incident shows. Your cert can be revoked at any time through no fault of your own. After being burned by TLS-SNI-01 the last time, now I refuse to use anything other than plain old HTTP-01 and DNS-01.

2. As soon as the version of the Linux distro I was using (not in my power to change!) reached EOL, certbot suddenly refused to renew, despite the fact that I'd been using more or less the same version of Python and certbot for a number of years and the HTTP-01 challenge requires nothing fancy at all. Why does everyone these days insist on making ops decisions for other people?

3. On a server with existing nginx virtual hosts, certbot injects configuration directives including stuff the nginx team officially recommends against, such as `if` statements. It frequently breaks existing configuration such as rewrites and redirects. After seeing this a number of times, the only conclusion I can make is that certbot has no idea how to manipulate nginx config files.

4. If I have multiple domains pointing at the same application, and remove one of them at a later time, certbot is oblivious and repeatedly fails trying to renew the certificate that now contains an invalid domain. Again, certbot doesn't know how to work with nginx.

Maybe 3 and 4 can be improved if ACME was integrated as a proper nginx module instead of certbot trying to change things from the outside. My experience as a whole, however, makes me feel that the LE/certbot teams are rather cavalier about the commitment to stability they need to make if they really want to become an essential part of the world's internet infrastructure. If you want to be paternalistic about managing TLS for people who don't know how to do it, at least try to do it properly!

certbot is just one of many, many ACME clients and libraries now available.

If you don't like how the nginx plugin works, then fork it.

Wrapping "certbot renew && nginx -s HUP" into a systemd service doesn't seem to be a very complex thing to do.

That's very close to what I'm doing, except I now refuse to touch certbot with a 10-foot pole. Plenty of better, do-one-thing-well, non-paternalistic ACME clients out there as you said.

The fact that there are alternatives, though, doesn't mean that the crappy "official" client isn't doing the LE ecosystem a disservice.

I tried solving this a different way for my selfhosted services.

Instead of running certbot on every server, I wrote a custom ACME client that runs on a master server and is responsible for requesting/renewing all certificates that I use. It also automatically deploys each cert to the correct server.

It is a single point of failure but it makes tracking certificate expiry, renewal and revocation so much easier.

Sounds like a fun project. Assuming you're talking about web servers, did you know Caddy can do that? Simply configure each one to use the same storage backend and Caddy will automatically coordinate management as a cluster, and share the certificate (and OCSP staple) resources.

(And depending on the storage backend, it's no longer a single point of failure. And even if storage is the failure, it's just storage, if it's down your servers will keep running.)

Huh that's actually quite interesting, I've never really looked into caddy as nginx has fulfilled most of my needs so far but I suppose it's about time I read up on it.
I wrote monitoring that was able to check all of my servers, all of my certificates, and alert me if certbot failed on any of them and their certificates were near expiry.

I call it "The Prometheus monitoring I already needed to make sure my servers are up and serving the websites they're supposed to"

That's what I pondered doing. Do you have any code you can open source? Thanks.
I'm not sure I want to publish it at the moment as I'm not satisfied with the quality of the codebase. It was just something I hacked together in 2 days lol.

The bulk of the work was already done for me as there was already an ACME library for my language.

I'll probably open source it once I have a chance to clean everything up.

Published code provides more value than unpublished. Not cleaned up is most academic research code, for which there is no time to clean up.

And that’s okay.

WebPKI certificate revocation doesn't work anyways. It fails in exactly the case where TLS is needed: MITM.

All certificate revocation-checking schemes "fail open" and proceed happily on their way if the MITM blocks their communications with the revocation lists.

If you somehow don't have to worry about MITM you don't need anything remotely close to the complexity of TLS.

Certificate revocation is mostly security theater.

> All certificate revocation-checking schemes "fail open" and proceed happily on their way if the MITM blocks their communications with the revocation lists.

Incorrect, firefox implements OCSP Must-Staple and treats a failure there as equivalent to a certificate validation failure. Now if only we could ever get chrome(ium) to implement it...

OCSP Must-Staple is just another name for "really short certificate expiration periods"

Must-Staple with a timeout of X seconds is functionally equivalent to a certificate with a validity time of X seconds. In either case you need to go fetch something from the CA every X seconds, or else get booted off the net. The only difference is what you call that thing.

   HN is doing its silly "slow down" nonsense, so I will reply to the reply below by editing here.
To @cmeacham98: I'm not moving the goalposts; all practical revocation schemes fail open. OCSP Must-Staple isn't actually a form of revocation; it is just expiry with a fancy name and lots of extra complexity.

The implementation quirks of one particular CA (LE) are not features of these protocols; they are features of that particular CA's policies.

To @mhils: CAs could easily stand up simple servers that let you ask for the most recent certificate (if any) issued to a given domain and public key.

They don't want to do this because it's a burden. The only reason they run OCSP servers is that they're under pressure from insurers, regulators, and auditors to participate in the whole revocation theater game. Saying "we don't do revocation" would cause people to freak out, and setting up an OCSP server takes effort, but less effort than explaining to the insurers/regulators/auditors that revocation doesn't actually work.

As for CT, the log could simply include an extra (issuer-signed) field which amounts to "I will auto-renew this certificate until $DATE" and omit all the additional certificates issued to that same public key and domain name between the log entry and $DATE. Of course the browser would need to understand this field in order to validate the CT log entry.

I'm not sure super-short expiration is really preferable; it makes the Web even more fragile than it already is. Frankly I think Web PKI is sort of a big mess at this point. Most of it only makes sense when viewed through the lens of "(a) governments will always control DNS for the the two-letter TLDs, (b) Google cannot get all governments to supplicate to Google, therefore (c) we must conjure up an extra layer of entities (the CAs) that can be bullied around by manipulation of browser engine code". This is basically the only reason why DANE (or a modernized revision of it) isn't used.

I absolutely agree that shorter lifetimes are likely preferrable over the additional complexity that comes with OCSP Must-Staple. That being said, one benefit of OCSP Must-Staple is that you don't need to authenticate yourself to get the OCSP response. Another benefit is that Certificate Transparency Logs remain smaller, which in turn also benefits CRLite filter sizes.
> I'm not moving the goalposts; all practical revocation schemes fail open. OCSP Must-Staple isn't actually a form of revocation; it is just expiry with a fancy name.

Of course it is revocation, it allows you to revoke a certificate before its normal expiration (90 days for LE).

I could make this argument in reverse: "very short lifetimes are functionally equivalent to OCSP Must-Staple, and thus is a form of revocation". Of course, this is ridiculous both ways: being similar or even 'functionally equivalent' does not make two things the same.

But that's just semantics. GP's point was that with Must-Staple, the "real" expiration period becomes pretty much irrelevant - instead, the lifetime of the OCSP response becomes the new effective lifetime of the certificate.

If you compare (1) a short-lived certificate and (2) a long-lived certificate with Must-Staple and short-lived OCSP responses, the benefits, security properties and failure modes of both are exactly the same*. You're just putting the timestamp into different fields.

(* Or almost: Some notable practical differences are described in the sibling comments - but those are mostly a property of LE's current policies, not the protocol itself)

This is moving the goalposts. You claimed that all certificate revocation methods fail-open, I pointed out that OCSP Must-Staple is a fail-closed revocation method that is implemented in a popular browser.

Regardless, even if "functionally equivalent", there is a practical difference: LE does not allow me to issue a certificate every minute, but does allow the OCSP Must-Staple extension.

It limits the time in which a certificate compromise can be exploited. Clients cache CRLs.

Without revocation, a compromised cert remains useful to an attacker for the entire validity period of the cert, it they can MITM you.

With revocation, they must MITM you constantly to prevent you from acquiring the revocation list. This substantially adds to cost and complexity of such an attack, and means that many, if not all clients will be protected.

So it's not perfect, but ask what the world looks like without it.

That's issue with buggy clients which should not proceed if CRL is not available. Not issue with PKI per se.
If they didn't fail open, every time a CA's website went down every single website that used their certificates would go offline as well.

You can imagine the DDOS-ransomers licking their lips at this possibility.

No, "fail open" has always been the only possible way to implement this. Which is why it's a broken idea from the start.

> If they didn't fail open, every time a CA's website went down every single website that used their certificates would go offline as well.

That's not correct. OCSP stamps exist to prevent that kind of a problem.

OCSP always seemed a bit absurd to me: Instead of sending a OCSP stamp, the CA could also issue a very short lived certificate on demand. It would have the same effect of asserting that the CA currently considers the server to be verified and it doesn't need a separate format.
There's a plan to make this information available to clients in the future: https://datatracker.ietf.org/doc/draft-aaron-acme-ari/
Yeah, further, I doubt many certificates were issued from an account key belonging to a email address that people monitor often, if not at all.
I hadn't considered this. Am I in the minority by having a legit, monitored email account for my ACME certs?
I thought having a proper email account was what most people do. Companies probably use a role-based address e.g. certmaster@myco.com so the email goes to whoever is responsible for it today.