Hacker News new | ask | show | jobs
by jacquesm 2106 days ago
> that a security hole was found in the protocol

Is there any supporting evidence for that because the only thing I have been able to find so far is that it was simply superseded by a newer version, mostly to support wildcard certs. What holes there were in V1 were closed within a day or two at most.

2 comments

The article says "then the challenge protocol was changed" so that's why people are talking about the protocol.

The only challenge which changed was tls-sni-01 which was removed and eventually replaced with tls-alpn-01

The tls-sni-01 challenge is safe unless there are bulk hosting sites whose web server for some crazy reason accepts SNI for names that are nonsensical, and then serves up answers chosen by an attacker who is also one of the customers on that server instead of from a victim on the same server.

Unfortunately somebody actually did ship software which is crazy in that specific way, and it's named Apache HTTPD server. You might have heard of it. So that's a problem.

So, Let's Encrypt deprecated this challenge and you can no longer use it. They did tell everybody affected, by email to the address they provided for contact. Since they are not psychic they don't have a way to reach out to people who felt they didn't need to be contacted.

I suspect given you mention wildcards you're thinking of ACMEv2 which isn't a challenge protocol. But again there were plenty of email notifications about the ACMEv2 upgrade, and you've in fact encountered exactly the anticipated scenario, you decided to build out a new thing using the old service and it told you not to do that. Your old things are still working, for almost another year, after already two years notice that this was going away, it's just that new things can't be launched against this already deprecated service.

You know this, but for the benefit of the thread: to say "tls-sni-01 is safe unless there are bulk hosting sites that break it" is to say that tls-sni-01 is unsafe. The "crazy" sites you're referring to included AWS and Heroku.

This all happened 2 years ago, so it's a bit odd to see it litigated today.

We briefly describe this history on page 6 of

https://jhalderm.com/pub/papers/letsencrypt-ccs19.pdf

in case anyone is more interested (there are also references there for further details). Twice, methods that seemed plausible for proving control over domain names turned out to make assumptions that were potentially violated by shared hosting environments.

Jacques, I'm really sorry for the hassle that these changes caused you.

Thanks for the link Seth. I wasn't aware this existed and it's sometimes nice to have something specific to cite as well as convenient that it's all in one place like this.

Edited to add: Wow the Sankey diagram (showing changes in which CA if any is used by a site) is something I hadn't seen anywhere else and is especially useful. Thanks again.

Heroku and (so far as I can tell) Cloudfront independently re-invented this stupidity. But if it was "just" say Heroku and Cloudfront you can imagine plausibly notifying those two providers to fix their broken infrastructure and then you're good.

Apache makes it unsalvageable by sheer numbers the same way it had already for HTTPS in http-01, so that's why I focused on Apache.

It's entirely possible for some fool to ship an exciting new cloud service that lets people bind to arbitrary ALPN values on a shared service and thereby re-introduce this problem for tls-alpn-01 - but unlike with tls-sni-01 that's not a bug common to hundreds of small bulk hosts using out of box Apache so I assume we'd tell the exciting upstart to knock it off and warn their customers what they're doing is inherently unsafe, rather than requiring Let's Encrypt to stop offering tls-alpn-01.

In fact we're already on the other side of this for the ordinary version of http-01 for a different reason. Apache really does potentially let an attacker who controls aaa-aardvark.example at some bulk host perform http-01 challenges for www.some-custom-site.example that has created A records pointing to the bulk host but hasn't currently actually got them serving www.some-custom-site.example maybe due to a typo or unpaid bill.

But most bulk hosts have specifically configured Apache to show a default "Did you pay? / Have you configured your hosting properly?" type site which is harmless in this case, and for the few that haven't users can understand that um, if they visit www.some-custom-site.example in their browser they get to the attacker's site, so like yeah, that's where the problem is, nothing new with http-01

I did provide an email address, never got any mail (I did actually check that).

> it's just that new things can't be launched against this already deprecated service.

Yes, I noticed. So, I now have the entirely unforced option of re-imaging a machine that is working just fine besides this little detail, which is in fact just one very small thing of a whole pile of much bigger things that run on that particular box. Not to mention migrating twentynine years of email to a new mail server.

I'm sure there is a lesson in there somewhere, but I'm not sure I'm overly receptive today, I had a lot of other stuff on my agenda.

If you let a server lag in OS version, at some point in time you're going to hit this kind of problem. If not with Let's Encrypt, then with some other dependency. I know, I've been in the exact same spot. I just don't blame the dependency, and included server OS updates as part of a yearly maintenance cycle.
I find that really ridiculous. Not you, but the fact that an OS needs to be upgraded because of some application level stuff that has to do with a protocol that is being run on some other server.

That's the kind of dependency snowball that we should work hard to avoid, not accept as some kind of new normality.

Servers should be able to live for years without re-imaging.

Is there a reason you can't just upgrade that one component on the server, why do you have to re-image it from scratch?

If you have external dependencies they are going to move around from time to time throughout their lifetimes, especially if they are beta. LetsEncrypt may not have signaled beta with v1, but I've been a cert-manager user for years in pre-1.0 and I've known that meant I might need to come up for air and read the docs for a specific upgrade instruction from one pre-1.0 minor version to another at any time.

Now cert-manager is 1.0+ and my expectations can change. It should remain backwards compatible until the next major version (hopefully for a while! And they will provide a migration path when that comes, with clear instructions and a fairly long sunset, godwilling)

But cert-manager depends on letsencrypt, and I depend on cert-manager, all of which depends on a protocol called acme, and this is the arrangement. We made this deal because it was going to turn out less complicated than managing the certificates by hand, and they made that deal because it was going to turn out better than rolling their own protocol from absolutely scratch, similarly. Eyes on the prize.

If you didn't want LetsEncrypt as a dependency there are other ways to connect cert-manager or another tool like it, including other acme providers... they all depend on the acme protocol, (or there might be some other protocol that you can use, with its own characteristics of change or stability, or roll your own) at some point you have to roll the dice and bet on something.

Occasionally these things happen. You suggest that servers should be able to go for years, (but they have allowed years for this transition! What more can be expected, realistically?)

> Is there a reason you can't just upgrade that one component on the server, why do you have to re-image it from scratch?

Yes, I did this now and I have it working. But it leaves things in a messed up state and I don't like that so I will go back to this in a short while and fix it properly.

What I still wonder about is why their warning email never reached me, that I really need to figure out because then at least I would have dealt with this under a lot less time pressure.

> If you didn't want LetsEncrypt as a dependency there are other ways to connect cert-manager or another tool like it, including other acme providers...

There are some very good suggestions in this thread, I will probably adopt one of them.

> You suggest that servers should be able to go for years, (but they have allowed years for this!)

And somehow I missed that memo. Even so, I am still not convinced of the necessity, it is possible that it exists but I have yet to see a valid reason for shutting down the old protocol for new registrations like this. There also seems to be some confusion with people saying it should have worked for the same account, which I can prove did not work.

I know you solved your issue; for others in the same boat, look into acme.sh. it's a shell only implementation, no python, no loads of dependencies. I used that to keep let's encrypt running on an ancient server (firewalled) that I cannot upgrade for reasons.
I decided to go with acme.sh instead of certbot on some servers because I am hoping that upgrading acme.sh will cause fewer headaches. But who knows...
Why TLS-SNI-01 was disabled: https://community.letsencrypt.org/t/2018-01-09-issue-with-tl...

Explanation that renewals will be disallowed after 1 year deprecation period: https://community.letsencrypt.org/t/march-13-2019-end-of-lif...

And as you seem to be talking about ACMEv1/v2 instead of TLS-SNI-01 (which I originally thought); it will be supported as long as June 2021 in some cases: https://community.letsencrypt.org/t/end-of-life-plan-for-acm...

ACMEv2 was introduced, because it is much closer to the actual spec. Enforcing this ensures that there are actually ACME implementations out there, instead of proprietary "Let's Encypt ACME" implementations. https://tools.ietf.org/html/rfc8555 https://github.com/letsencrypt/boulder/blob/master/docs/acme...

To me this seems like a sensible compromise between backwards compatibility and their mission for standardized automated renewals.

Yes, but that particular hole was fixed, wasn't it?
You can't "fix" the tls-sni-01 hole except by going back in a time machine to when Apache implements SNI and spraying all the involved developers with water. "No, bad developer, no biscuit. Do what the protocol specification actually says not whatever half-arsed nonsense you thought would work".

If there were like six web servers in the whole world that got this wrong, we could say "Fix those servers, fools" and sleep soundly knowing that those six servers are all that's affected. But Apache makes the scope too big to do that reasonably. It's a judgement call, but in this case the call was very easy.

I don't see what that has to do with me because there is no Apache on that server (just Nginx).
But Let's Encrypt is part of the Web PKI, and the Web PKI is for all names on the public Internet, not just any operated by Jacques Mattheij. You sought certificates from the Web PKI, probably because you wanted somebody else other than Jacques Mattheij to trust them.

A large fraction of public Internet HTTPS servers run Apache, which means tls-sni-01 is unsafe for a non-trivial fraction of names, which means we need to tell Certificate Authorities not to use this method or those like it. Specifically 3.2.2.4.10. TLS Using a Random Number has to be approached differently if it's to be attempted. The tls-alpn-01 challenge implements 3.2.2.4.10 using ALPN instead of SNI and appears to be safe in practice.

There was this joke when I was a fledgling programmer 35 years ago: If engineers would build bridges the way programmers build software the first woodpecker to come along would destroy civilization as we know it.

I think your comment is a nice illustration of that.

To me if a piece of software has a problem then it is that piece of software that should be fixed, not to push the burden onto everybody else as well. That's just so wrong.

But that does not mean I don't follow your reasoning and understand why this decision was made, still, the amount of waste here is incredible.