Hacker News new | ask | show | jobs
by jacquesm 2106 days ago
If you run something that issues 100 million or more certificates per year then backwards compatibility is not something that you toss out just because you can. Forcing that many web properties to upgrade their software (regardless of which party produced what) is discounting the combined effort that will take on the part of the users/sysadmins of those systems for something that could have just as easily been avoided.

You deprecate interfaces like these but you don't just shut them down, especially not when they are still seeing major use.

Just imagine that tomorrow IPV4 would be shut down because we've all had enough time to switch by now.

2 comments

The reason the old interface was deprecated was that a security hole was found in the protocol. That is one of the few cases where it is reasonable to break backward compatability in this manner.

Especially when dealing with certificates, where the security is one of the top reasons to want to go there.

> that a security hole was found in the protocol

Is there any supporting evidence for that because the only thing I have been able to find so far is that it was simply superseded by a newer version, mostly to support wildcard certs. What holes there were in V1 were closed within a day or two at most.

The article says "then the challenge protocol was changed" so that's why people are talking about the protocol.

The only challenge which changed was tls-sni-01 which was removed and eventually replaced with tls-alpn-01

The tls-sni-01 challenge is safe unless there are bulk hosting sites whose web server for some crazy reason accepts SNI for names that are nonsensical, and then serves up answers chosen by an attacker who is also one of the customers on that server instead of from a victim on the same server.

Unfortunately somebody actually did ship software which is crazy in that specific way, and it's named Apache HTTPD server. You might have heard of it. So that's a problem.

So, Let's Encrypt deprecated this challenge and you can no longer use it. They did tell everybody affected, by email to the address they provided for contact. Since they are not psychic they don't have a way to reach out to people who felt they didn't need to be contacted.

I suspect given you mention wildcards you're thinking of ACMEv2 which isn't a challenge protocol. But again there were plenty of email notifications about the ACMEv2 upgrade, and you've in fact encountered exactly the anticipated scenario, you decided to build out a new thing using the old service and it told you not to do that. Your old things are still working, for almost another year, after already two years notice that this was going away, it's just that new things can't be launched against this already deprecated service.

You know this, but for the benefit of the thread: to say "tls-sni-01 is safe unless there are bulk hosting sites that break it" is to say that tls-sni-01 is unsafe. The "crazy" sites you're referring to included AWS and Heroku.

This all happened 2 years ago, so it's a bit odd to see it litigated today.

We briefly describe this history on page 6 of

https://jhalderm.com/pub/papers/letsencrypt-ccs19.pdf

in case anyone is more interested (there are also references there for further details). Twice, methods that seemed plausible for proving control over domain names turned out to make assumptions that were potentially violated by shared hosting environments.

Jacques, I'm really sorry for the hassle that these changes caused you.

Thanks for the link Seth. I wasn't aware this existed and it's sometimes nice to have something specific to cite as well as convenient that it's all in one place like this.

Edited to add: Wow the Sankey diagram (showing changes in which CA if any is used by a site) is something I hadn't seen anywhere else and is especially useful. Thanks again.

Heroku and (so far as I can tell) Cloudfront independently re-invented this stupidity. But if it was "just" say Heroku and Cloudfront you can imagine plausibly notifying those two providers to fix their broken infrastructure and then you're good.

Apache makes it unsalvageable by sheer numbers the same way it had already for HTTPS in http-01, so that's why I focused on Apache.

It's entirely possible for some fool to ship an exciting new cloud service that lets people bind to arbitrary ALPN values on a shared service and thereby re-introduce this problem for tls-alpn-01 - but unlike with tls-sni-01 that's not a bug common to hundreds of small bulk hosts using out of box Apache so I assume we'd tell the exciting upstart to knock it off and warn their customers what they're doing is inherently unsafe, rather than requiring Let's Encrypt to stop offering tls-alpn-01.

In fact we're already on the other side of this for the ordinary version of http-01 for a different reason. Apache really does potentially let an attacker who controls aaa-aardvark.example at some bulk host perform http-01 challenges for www.some-custom-site.example that has created A records pointing to the bulk host but hasn't currently actually got them serving www.some-custom-site.example maybe due to a typo or unpaid bill.

But most bulk hosts have specifically configured Apache to show a default "Did you pay? / Have you configured your hosting properly?" type site which is harmless in this case, and for the few that haven't users can understand that um, if they visit www.some-custom-site.example in their browser they get to the attacker's site, so like yeah, that's where the problem is, nothing new with http-01

I did provide an email address, never got any mail (I did actually check that).

> it's just that new things can't be launched against this already deprecated service.

Yes, I noticed. So, I now have the entirely unforced option of re-imaging a machine that is working just fine besides this little detail, which is in fact just one very small thing of a whole pile of much bigger things that run on that particular box. Not to mention migrating twentynine years of email to a new mail server.

I'm sure there is a lesson in there somewhere, but I'm not sure I'm overly receptive today, I had a lot of other stuff on my agenda.

If you let a server lag in OS version, at some point in time you're going to hit this kind of problem. If not with Let's Encrypt, then with some other dependency. I know, I've been in the exact same spot. I just don't blame the dependency, and included server OS updates as part of a yearly maintenance cycle.
I find that really ridiculous. Not you, but the fact that an OS needs to be upgraded because of some application level stuff that has to do with a protocol that is being run on some other server.

That's the kind of dependency snowball that we should work hard to avoid, not accept as some kind of new normality.

Servers should be able to live for years without re-imaging.

I know you solved your issue; for others in the same boat, look into acme.sh. it's a shell only implementation, no python, no loads of dependencies. I used that to keep let's encrypt running on an ancient server (firewalled) that I cannot upgrade for reasons.
I decided to go with acme.sh instead of certbot on some servers because I am hoping that upgrading acme.sh will cause fewer headaches. But who knows...
Why TLS-SNI-01 was disabled: https://community.letsencrypt.org/t/2018-01-09-issue-with-tl...

Explanation that renewals will be disallowed after 1 year deprecation period: https://community.letsencrypt.org/t/march-13-2019-end-of-lif...

And as you seem to be talking about ACMEv1/v2 instead of TLS-SNI-01 (which I originally thought); it will be supported as long as June 2021 in some cases: https://community.letsencrypt.org/t/end-of-life-plan-for-acm...

ACMEv2 was introduced, because it is much closer to the actual spec. Enforcing this ensures that there are actually ACME implementations out there, instead of proprietary "Let's Encypt ACME" implementations. https://tools.ietf.org/html/rfc8555 https://github.com/letsencrypt/boulder/blob/master/docs/acme...

To me this seems like a sensible compromise between backwards compatibility and their mission for standardized automated renewals.

Yes, but that particular hole was fixed, wasn't it?
You can't "fix" the tls-sni-01 hole except by going back in a time machine to when Apache implements SNI and spraying all the involved developers with water. "No, bad developer, no biscuit. Do what the protocol specification actually says not whatever half-arsed nonsense you thought would work".

If there were like six web servers in the whole world that got this wrong, we could say "Fix those servers, fools" and sleep soundly knowing that those six servers are all that's affected. But Apache makes the scope too big to do that reasonably. It's a judgement call, but in this case the call was very easy.

I don't see what that has to do with me because there is no Apache on that server (just Nginx).
>Just imagine that tomorrow IPV4 would be shut down because we've all had enough time to switch by now.

Honestly? I would absolutely love to watch that shitshow.

The anticipated order of events goes something like this:

Firstly the islands of IPv6 grow until they begin to dwarf the supposed generally interoperable ocean of IPv4. Big home ISPs, major CDNs, bulk hosts, AWS, and so on.

Somewhere around this time you'd start to see events reported where "the Internet" was down for lots of people but it was the IPv4 Internet, which they are increasingly not using so they didn't actually notice. "Your Internet was down" "No it wasn't, I was on Facebook all afternoon" "Right yeah, but other than Facebook" "I watched a movie on Netflix" "OK, other than Facebook and Netflix" "I got a mail from Jeremy on GMail" "OK, other than Facebook and Netflix and GMail" "Not much of an Internet". Happy Eyeballs, the algorithm that allowed IPv6 to be deployed in dual stack environment successfully, now allows IPv4 to ramp down imperceptibly.

Now, with the "ocean" so small, increasingly medium sized operators ignore it entirely, opting just to maintain translators at the edge of the IPv4 Internet, maybe your ISP does this, and you can't get "real" IPv4 addresses, although many of you already don't so this wouldn't be a change.

The last major steps taken by "the Internet" look like this:

The tier one providers who by that point are also more or less the global telecommunications companies, begin to deprecate IPv4 service, seeing it as a niche product that can better be serviced by specialists in your locale. Increasingly the only practical route from one IPv4 address to another IPv4 address is via two translators and IPv6.

The RIRs discontinue management of the namespace/ numberspace for IPv4 and so the allocation of IPv4 addresses ceases to be globally co-ordinated. The IPv4 Internet no longer formally exists, just many islands of legacy IPv4 in an IPv6 ocean which happen to have mostly discontiguous addressing.

Can you please wait until I'm past my 'best before' date when you pull that particular plug?