Hacker News new | ask | show | jobs
Issue with TLS-ALPN-01 Validation Method (community.letsencrypt.org)
164 points by kpetermeni 1605 days ago
8 comments

Caddy rightly gets a lot of love from its users - I've been a super satisfied user for a while. Rock solid reverse proxy to some selfhosted services.
Wonder how many ACME deployments check for revocation, rather than just being on an infrequent cron job? What proportion of affected certificates will be automatically renewed with no effort?

Looking at a few docs, probably not many. In any case there isn't (?) an in-band way to tell the clients that the cert is going to be revoked before it is revoked, so there would be some disruption.

Caddy does. https://community.letsencrypt.org/t/questions-about-renewing...

And this is one reason why I keep advocating for certificate automation to be built into services/apps, rather than patched on the outside with duck tape.

I look forward to the day when cert lifetimes are regularly about as short as OCSP responses. Then we can possibly do away with OCSP entirely.*

(* I am of the opinion that revocation is fundamentally broken for Web PKI and it should be phased out in favor of short cert lifetimes. You may disagree and that's fine, but I'm happy to discuss why if you're interested.)

RE: revocation broken

Absolutely. Especially with the advent of protocols like ACME it just makes sense.

Here's a nice blogpost from smallstep (not affiliated) on this topic: https://smallstep.com/blog/passive-revocation/

Certificate expiries are set primarily due to financial interests that have nothing to do with security.

Why do you think the maximum lifetime was reduced from two years to one?

Does it make a lick of difference if you’re man-in-the-middled for just one year instead of two? What kind of argument is that!?

“Oh, they got every active user credential and form that was submitted ages ago, but no worries! This won’t go on for another year! Just months to go now…”

No, obviously the CA cartel just wanted to extract 2x the rent.

The whole thing is just absurd on its face and needs to stop, but there are billions of dollars worth of rent seekers that say…

“No.”

Certificates have always been sold and priced per year. I highly doubt lifetime changes benefitted certificate authorities, and I’m pretty sure they had it forced upon them by browsers. If anything, it prevents them from collecting 2-3 years of revenue upfront.
The reduction to 27 months was voted through CA/B (where either CAs or Browsers can effectively veto, like the way Northern Ireland is governed) but only after Ryan suggested Google might just unilaterally impose 90 days if the CAs rejected a reduction.

The reduction to 398 days was imposed by Apple, unilaterally, although in practice the ecosystem went along with it. It actually took a few weeks to get clarity on exactly what Apple intended, they just basically blurted it out at a meeting.

"although in practice the ecosystem went along with it" - not that there was much choice, but some CAs were less surprised and grumbled less than others...
You might want to double-check that. The CAs (all but two, basically) disagreed with the lifetime reduction and actively voted and argued against it.
That’s because the 1 year certs are too short to be a meaningful difference from LE, so they lose their selling point. So being against the shorter expiration date is just a nefarious plot to make more money, just like supporting it would be. They are so nefarious, they’re nefarious either way! It’s a Certs-22.
Better to be man-in-the-middled for:

- One year instead of two? Yep

- 3 months instead of 1 year? Yep.

- 1 week instead of 3 months? Yep.

The reason certificates have traditionally been so long is because it was a manual process. Using ACME it is possible to expire certificates every hour if you wanted to do that.

In the past, revocation was supposed to help cases where the owner of the certificate exposed the private key in one way or another.

Now it seems that revocation is supposed to help the CA covering up mistakes made by the CA.

Maybe we actually need a better CA.

The Baseline Requirements require revocation of misissued certificates, this isn't "a CA covering up mistakes."
> Maybe we actually need a better CA.

Go for it. Start one and tell us how it went.

The way LE and others keep breaking this process and the tools around it is certainly not a great endorsement for having it integrated into a service.
Can you give some example of the kind of breakage your experienced?
Not OP, but here are some things I've personally experienced:

1. Supposedly more secure challenge types such as TLS-ALPN-01 are far from stable, as the current incident shows. Your cert can be revoked at any time through no fault of your own. After being burned by TLS-SNI-01 the last time, now I refuse to use anything other than plain old HTTP-01 and DNS-01.

2. As soon as the version of the Linux distro I was using (not in my power to change!) reached EOL, certbot suddenly refused to renew, despite the fact that I'd been using more or less the same version of Python and certbot for a number of years and the HTTP-01 challenge requires nothing fancy at all. Why does everyone these days insist on making ops decisions for other people?

3. On a server with existing nginx virtual hosts, certbot injects configuration directives including stuff the nginx team officially recommends against, such as `if` statements. It frequently breaks existing configuration such as rewrites and redirects. After seeing this a number of times, the only conclusion I can make is that certbot has no idea how to manipulate nginx config files.

4. If I have multiple domains pointing at the same application, and remove one of them at a later time, certbot is oblivious and repeatedly fails trying to renew the certificate that now contains an invalid domain. Again, certbot doesn't know how to work with nginx.

Maybe 3 and 4 can be improved if ACME was integrated as a proper nginx module instead of certbot trying to change things from the outside. My experience as a whole, however, makes me feel that the LE/certbot teams are rather cavalier about the commitment to stability they need to make if they really want to become an essential part of the world's internet infrastructure. If you want to be paternalistic about managing TLS for people who don't know how to do it, at least try to do it properly!

certbot is just one of many, many ACME clients and libraries now available.

If you don't like how the nginx plugin works, then fork it.

Wrapping "certbot renew && nginx -s HUP" into a systemd service doesn't seem to be a very complex thing to do.

I tried solving this a different way for my selfhosted services.

Instead of running certbot on every server, I wrote a custom ACME client that runs on a master server and is responsible for requesting/renewing all certificates that I use. It also automatically deploys each cert to the correct server.

It is a single point of failure but it makes tracking certificate expiry, renewal and revocation so much easier.

Sounds like a fun project. Assuming you're talking about web servers, did you know Caddy can do that? Simply configure each one to use the same storage backend and Caddy will automatically coordinate management as a cluster, and share the certificate (and OCSP staple) resources.

(And depending on the storage backend, it's no longer a single point of failure. And even if storage is the failure, it's just storage, if it's down your servers will keep running.)

Huh that's actually quite interesting, I've never really looked into caddy as nginx has fulfilled most of my needs so far but I suppose it's about time I read up on it.
I wrote monitoring that was able to check all of my servers, all of my certificates, and alert me if certbot failed on any of them and their certificates were near expiry.

I call it "The Prometheus monitoring I already needed to make sure my servers are up and serving the websites they're supposed to"

That's what I pondered doing. Do you have any code you can open source? Thanks.
I'm not sure I want to publish it at the moment as I'm not satisfied with the quality of the codebase. It was just something I hacked together in 2 days lol.

The bulk of the work was already done for me as there was already an ACME library for my language.

I'll probably open source it once I have a chance to clean everything up.

Published code provides more value than unpublished. Not cleaned up is most academic research code, for which there is no time to clean up.

And that’s okay.

WebPKI certificate revocation doesn't work anyways. It fails in exactly the case where TLS is needed: MITM.

All certificate revocation-checking schemes "fail open" and proceed happily on their way if the MITM blocks their communications with the revocation lists.

If you somehow don't have to worry about MITM you don't need anything remotely close to the complexity of TLS.

Certificate revocation is mostly security theater.

> All certificate revocation-checking schemes "fail open" and proceed happily on their way if the MITM blocks their communications with the revocation lists.

Incorrect, firefox implements OCSP Must-Staple and treats a failure there as equivalent to a certificate validation failure. Now if only we could ever get chrome(ium) to implement it...

OCSP Must-Staple is just another name for "really short certificate expiration periods"

Must-Staple with a timeout of X seconds is functionally equivalent to a certificate with a validity time of X seconds. In either case you need to go fetch something from the CA every X seconds, or else get booted off the net. The only difference is what you call that thing.

   HN is doing its silly "slow down" nonsense, so I will reply to the reply below by editing here.
To @cmeacham98: I'm not moving the goalposts; all practical revocation schemes fail open. OCSP Must-Staple isn't actually a form of revocation; it is just expiry with a fancy name and lots of extra complexity.

The implementation quirks of one particular CA (LE) are not features of these protocols; they are features of that particular CA's policies.

To @mhils: CAs could easily stand up simple servers that let you ask for the most recent certificate (if any) issued to a given domain and public key.

They don't want to do this because it's a burden. The only reason they run OCSP servers is that they're under pressure from insurers, regulators, and auditors to participate in the whole revocation theater game. Saying "we don't do revocation" would cause people to freak out, and setting up an OCSP server takes effort, but less effort than explaining to the insurers/regulators/auditors that revocation doesn't actually work.

As for CT, the log could simply include an extra (issuer-signed) field which amounts to "I will auto-renew this certificate until $DATE" and omit all the additional certificates issued to that same public key and domain name between the log entry and $DATE. Of course the browser would need to understand this field in order to validate the CT log entry.

I'm not sure super-short expiration is really preferable; it makes the Web even more fragile than it already is. Frankly I think Web PKI is sort of a big mess at this point. Most of it only makes sense when viewed through the lens of "(a) governments will always control DNS for the the two-letter TLDs, (b) Google cannot get all governments to supplicate to Google, therefore (c) we must conjure up an extra layer of entities (the CAs) that can be bullied around by manipulation of browser engine code". This is basically the only reason why DANE (or a modernized revision of it) isn't used.

I absolutely agree that shorter lifetimes are likely preferrable over the additional complexity that comes with OCSP Must-Staple. That being said, one benefit of OCSP Must-Staple is that you don't need to authenticate yourself to get the OCSP response. Another benefit is that Certificate Transparency Logs remain smaller, which in turn also benefits CRLite filter sizes.
> I'm not moving the goalposts; all practical revocation schemes fail open. OCSP Must-Staple isn't actually a form of revocation; it is just expiry with a fancy name.

Of course it is revocation, it allows you to revoke a certificate before its normal expiration (90 days for LE).

I could make this argument in reverse: "very short lifetimes are functionally equivalent to OCSP Must-Staple, and thus is a form of revocation". Of course, this is ridiculous both ways: being similar or even 'functionally equivalent' does not make two things the same.

But that's just semantics. GP's point was that with Must-Staple, the "real" expiration period becomes pretty much irrelevant - instead, the lifetime of the OCSP response becomes the new effective lifetime of the certificate.

If you compare (1) a short-lived certificate and (2) a long-lived certificate with Must-Staple and short-lived OCSP responses, the benefits, security properties and failure modes of both are exactly the same*. You're just putting the timestamp into different fields.

(* Or almost: Some notable practical differences are described in the sibling comments - but those are mostly a property of LE's current policies, not the protocol itself)

This is moving the goalposts. You claimed that all certificate revocation methods fail-open, I pointed out that OCSP Must-Staple is a fail-closed revocation method that is implemented in a popular browser.

Regardless, even if "functionally equivalent", there is a practical difference: LE does not allow me to issue a certificate every minute, but does allow the OCSP Must-Staple extension.

It limits the time in which a certificate compromise can be exploited. Clients cache CRLs.

Without revocation, a compromised cert remains useful to an attacker for the entire validity period of the cert, it they can MITM you.

With revocation, they must MITM you constantly to prevent you from acquiring the revocation list. This substantially adds to cost and complexity of such an attack, and means that many, if not all clients will be protected.

So it's not perfect, but ask what the world looks like without it.

That's issue with buggy clients which should not proceed if CRL is not available. Not issue with PKI per se.
If they didn't fail open, every time a CA's website went down every single website that used their certificates would go offline as well.

You can imagine the DDOS-ransomers licking their lips at this possibility.

No, "fail open" has always been the only possible way to implement this. Which is why it's a broken idea from the start.

> If they didn't fail open, every time a CA's website went down every single website that used their certificates would go offline as well.

That's not correct. OCSP stamps exist to prevent that kind of a problem.

OCSP always seemed a bit absurd to me: Instead of sending a OCSP stamp, the CA could also issue a very short lived certificate on demand. It would have the same effect of asserting that the CA currently considers the server to be verified and it doesn't need a separate format.
There's a plan to make this information available to clients in the future: https://datatracker.ietf.org/doc/draft-aaron-acme-ari/
Yeah, further, I doubt many certificates were issued from an account key belonging to a email address that people monitor often, if not at all.
I hadn't considered this. Am I in the minority by having a legit, monitored email account for my ACME certs?
I thought having a proper email account was what most people do. Companies probably use a role-based address e.g. certmaster@myco.com so the email goes to whoever is responsible for it today.
Can anyone make sense of what they're trying to tell there?

They found some issue ("irregularities") and made 2 changes, but the changes are merely restricting the TLS version to 1.2 and deprecating an old OID identifier. While TLS < 1.2 certainly is not ideal, I don't see how this would impact the ACME validation, and the old OID should be irrelevant as well.

(I have been somewhat concerned about the security properties of the acme/alpn validation for unrelated other reasons, but haven't been able to pin that down to a specific threat - notably the RFC implies that the security is improved due to strict ALPN validation, which in practice usually does not happen.)

Update: RFC 8737 (the ALPN validation method) says "ACME servers that implement "acme-tls/1" MUST only negotiate TLS 1.2". So maybe this is "just" a policy issue?

Head of Let’s Encrypt here. This is a compliance issue, there is no security or validation integrity risk.
Damn, unfortunate that you will revoke all these certificates if there is truly no security risk. This is likely going to break a lot of our users, or require manual intervention within the next 42 hours.
I have absolutely no insider knowledge.

However, we have seen lots of incidents where the actual problem is that we'd relied upon something that seems like it should be true and actually isn't. So what about a situation where an HTTPS server accepts tls-alpn-01 validation attempts but actually it has no idea what tls-alpn-01 validation even is ? Maybe it takes some (or all?) TLS extensions and just pretends to accept them.

On a bulk host I think you could abuse that to get "validated" despite having only a very tangential relationship (sharing an IP address) to the validated name, similar to the problem with previous TLS based validation methods.

The draft RFC for tls-alpn-01 says TLS 1.2 is mandatory, so, while that might be unrelated, it also might be that there's a bunch of servers out there which get this wrong but don't speak TLS 1.2 and the expectation is that nobody will upgrade them to speak TLS 1.2 but still get this wrong (or maybe for other reasons their misbehaviour means they'll catch fire in TLS 1.2)

Just a random guess.

The security of TLS-ALPN-01 relies on TLS implementations rejecting a connection if an unknown ALPN is present.

It is possible that TLS 1.1 and earlier do not require this behavior leading to exactly the SNI confusion that this mechanism was meant to prevent.

I think tls-alpn-01 doesn't need you to reject the connection, my understanding is that successful validation requires three things:

The server agrees this SNI matches its name

The server agrees it offers this ALPN protocol

The server provides the tls-alpn-01 magic certificate agreed via ACME

Unfortunately none of these three steps requires affirmative work by the server to get it wrong, they can just passively nod along. "Yeah, I'm abandoned-server.bank.example, whatever you say", "Yeah, sure I can talk alpn/1 protocol, whatever that is", "Yeah, this certificate I was given by some bozo is definitely my certificate"

We know from previous incidents that just because something is obviously a bad idea, or even explicitly forbidden, doesn't mean it won't get done unless we also make it difficult so that it's easier not to.

> The security of TLS-ALPN-01 relies on TLS implementations rejecting a connection if an unknown ALPN is present.

I hope it does not, because the majority of servers don't reject unknown ALPNs. (See: ALPACA attack)

RFC 8737, Section 5: "The second assumption is that a server will not violate [RFC7301] by blindly agreeing to use the "acme-tls/1" protocol without actually understanding it."

RFC 7301, Section 3,2: "The server SHALL NOT respond with a selected protocol and subsequently use a different protocol for application data exchange."

I know what the RFCs say. RFC 7301 also says: "In the event that the server supports no protocols that the client advertises, then the server SHALL respond with a fatal "no_application_protocol" alert." It's not what happens in reality. You may try:

openssl s_client -connect google.com:443 -alpn foobar

Now if you come up with a way to exploit this behavior I'm interested to hear that. (I was at that point a few weeks ago, but I haven't gotten around doing a thorough analysis how relevant that property is for the ALPN method.)

(There's a subtlety here I should note: It seems many servers will accept garbage ALPN identifiers, but will not answer with those identifiers. Instead they will allow a connection with their default protocol and not answer ALPN at all. This likely makes this nonexploitable in the ACME case, but still feels problematic that they rely on such subtleties.)

It seems the reasoning here is: the TLS handshake might have used an insecure TLS version, and so they cannot be sure that the handshake worked the way they thought, and so the certificates could have been issued to the wrong party.

I don't have a deep understanding of the TLS-ALPN-01 validation nor of the vulnerabilities they might be concerned about, but that would be the only reason for revoking certificates (unless it's more of a political statement, "we revoke certs when we screwed up!").

The population of browsers and things that don't understand TLS1.2 is miniscule now, so there should be no impact in disabling tls1.1 and below anywhere possible.
Browsers aren't to the point here, it is referring to servers seeking to verify control of a domain, by conducting a “acme-tls/1” handshake initiated by the Let's Encrypt issuance server.

If such a server only supports TLS 1.1, then TLS-APLN-01 validation will fail after this change is implemented.

Nobody should be running tls1.1 only (and incapable of opportunistically negotiating for 1.2 or 1.3 instead) on their public facing httpd in 2022, I disabled everything below 1.2 on some rather high traffic websites several years ago with zero impact.
This is the second security issue with a TLS-based challenge [1]. This was a good reminder to switch to the HTTP challenge for the one remaining server I had that was affected.

[1] https://letsencrypt.org/docs/challenge-types/#tls-sni-01

That might work for you, but ALPN needs to exist because there's more to the Internet than just HTTP, and TLS can be used for those non-HTTP protocols. Some of those protocols are more fundamental than HTTP, and making them depend on HTTP would create a circular dependency.

  HN is choking again, so I must reply with edits *sigh*
@tialaramex, you're confusing policies of one CA (LE) with the ALPN protocol. Lets Encrypt isn't the only CA out there. Even so, you can do TLS-ALPN on any port. You can do TLS-ALPN on port 443 without using the HTTP protocol in any way. To ALPN, 443 is just an arbitrary number, like the IP address of Lets Encrypt's server.

> If you actually want certificate issuance unrelated to web servers you should either hook up a web server

Good heavens, no.

It's not about the ALPN protocol it's about the Baseline Requirements. Unless I'm gravely mistaken it certainly isn't the intention that you're allowed to accept tls-alpn-01 validation from some random service on say, port 8080 or 6697 as suitable for the purpose of validating control over a name for the Web PKI and I'd be grateful if you know of a public CA offering this that you'd say which ones and how you're aware of that.

That's why I listed three other ports, 80, 25 and 22. Those three are Authorized in the BRs for the purpose of validation because it does indeed seem unlikely that I can spin up a server on those ports if I do not control the machine they're answering for. Let's Encrypt does not use them for tls-alpn-01, and certainly doing so for ports 80 or 22 would seem really weird, but the rules aren't intended to prohibit it.

TLS doesn't have a fixed port number. Ergo, TLS-ALPN doesn't either.

It is the intention of the ALPN spec that you can do tls-alpn-01 on whatever TCP port the two parties (issuer and recipient) care to use.

Although "TLS can be used for those non-HTTP protocols" the tls-alpn-01 validation can only be used on the authorised ports, which for Let's Encrypt is port 443, aka HTTPS.

Now, Let's Encrypt would technically be allowed to enable this validation on a few other ports, 80 (HTTP), 25 (SMTP) and 22 (SSH) under current Baseline Requirements, but understandably they have no plan to do that.

If you actually want certificate issuance unrelated to web servers you should either hook up a web server explicitly for issuance or use DNS proof of control.

> Good heavens, no.

It's not exactly the most elegant solution, but I don't understand the aversion either. A "web server" that is only intended to serve the challenge file can be as simple as a thread that writes a static blob of bytes to a socket. That's nc -l stuff.

If you're already modifying your TLS backend to understand the ALPN challenge, I don't see why it would be that hard to add logic for one specific GET/200 OK pair.

> Good heavens, no.

You skipped the second part of the argument. And what "no" means in that context. What's your alternative?

Small feedback for the letsencrypt folk: I got the email saying that I have two ACME accounts ids affected. It would have been nice to know which domains are (even if it's just the first ~10 or so per account).
The full list of affected certificates and domains is now available: https://community.letsencrypt.org/t/170449/
They made the same mistake with the SNI deprecation and got plenty of feedback about it...

It's disappointing that they haven't learned from this.

I understand your pain. Maybe keep in mind what operation they are running (complexity and scale) and that feedback may take time to implement. To put this into perspective think about what replacing certificates (or say obtaining new certificates) felt like before Let's Encrypt an ACME were a thing. ;-)
What I don't quite get with all the certificate automation: Doesn't this all effectively just shift the "source of truth" to DNS?

Back when certificates were issued manually, a CA was also verifying that the requesting party was actually who they were claimed to be IRL - hence EV certificates and all that.

What LE and friends verify on the other hand is simply that the entity that requests a certificate also controls the DNS entry at that point in time - or at least controls some of the servers that are listed in the A/AAAA records.

For one of the infamous Authoritarian Governments, it should be no problem at all to obtain an LE certificate for any domain under their ccTLD. Just use the DNS challenge, then instruct the country's registrar to change the DNS record for the domain of interest.

Isn't that a massive centralisation compared to the old system?

> Back when certificates were issued manually, a CA was also verifying that the requesting party was actually who they were claimed to be IRL

I see. For say, the Bank of America, how did you imagine this working? The CA maybe has a guy fly to BoA headquarters, meet up with the CEO and chairman, and then sign off on the certificate? Wait, one guy isn't really much assurance is it. So I guess they'd need a whole team of people to be sure, jetting around the world, meeting up with the senior leadership of companies and checking their bonafides.

Do I need to tell you it wasn't actually like that?

> hence EV certificates and all that.

EV comes into existence as part of a deal between the Certificate Authorities and the Browser vendors back when desktop PCs were very important. They each want different things, and the resulting compromise leads to the Baseline Requirements and the CA/B Forum.

What the browsers want, and get, is actual validation for all certificates. I know, that sounds like a low bar, but that's how bad things had gotten. The CA/B BRs don't initially even specify how the validation should work, that takes until the "Ten Blessed Methods" a few years ago.

What the CAs want, and get, is desktop browser UI dressing that makes their most expensive certificates look cool under the name "Extended Validation". The browsers can't and don't promise this is a good idea, but it keeps CA bottom lines healthy which is important to them as markets open up and prices fall.

Now it turns out that the CA/B and BRs are a useful ratchet on overall validation and security practices and, probably, overall this benefits the Browsers (who by now are effectively the OS vendors, with Mozilla standing in for the Free Unixes) more than the CAs. But arguably it also benefits the CAs because with weak validation the whole thing is useless, and they go out of business anyway.

> The CA maybe has a guy fly to BoA headquarters, meet up with the CEO and chairman, and then sign off on the certificate?

I assume the BoA doesn't require their CEO to personally meet with every subcontractor that BoA gets into a business relationship with. You have employees for that?

Why would working with a CA be any different then, say, hiring an attorney or opening a bank account?

Well this is a fun game though isn't it. If a firm of lawyers sues you "on behalf of Bank of America", at what point do you feel like they didn't check properly who their client was and so they are responsible for the bogus lawsuit and the resulting costs not this enormous corporation?

If only the manager of a local BoA branch told them they were hired?

How about if it's an assistant manager?

How about if rather than meeting them in the branch, the supposed assistant manager was in the area and so dropped in to the law office in person?

The attorneys weren't available, so, they did a Zoom call?

Just a phone call?

Actually it was an email.

At some point, you realise, wait, they didn't actually validate anything of value here did they, anybody could be this supposed "Bank of America". And the reality is that PKIX certificates began that slide essentially immediately, before even the PKIX working group was set up.

And this is only half of the problem. It's easy for Bank of America because we're both thinking of the same entity, but "Big Bob's" might be a burger restaurant in your city, a private security firm in mine, and an LA law firm, so a certificate for "Big Bob's" doesn't even "validate" a name we're agreed on. That's why DNS ends up mattering, the DNS offers a single global namespace.

Come on, that's not how it works. There are specific, well-defined circumstances that define what a particular legal entity (such as a company or a corporation) is and who may or may not act on its behalf.

Otherwise, any kind of company could escape responsibility by simply pretending it doesn't exist and every employee just acted on their own.

But which one is "not how it works" ?

As I said, for the DNS validation we actually have pretty specific technical rules today, the "Ten Blessed Methods" (well, their modern successors) which is why we're talking about one of those methods here (tls-alpn-01 is method 3.2.2.4.20)

Today there are rules for EV but they're understandably vague, because they're talking about the problem we addressed above, eventually they get down this idea of a "Principal Individual" which can include "an employee" who is merely "authorised to conduct business" on behalf of (in our example) Bank of America and of course you're back to square one. How can we know they're authorised ?

The trick in the DNS validation is that we're asking a question machines could potentially have an authoritative answer to. Does this applicant control this DNS name. Not "Should they?". Not to "Are they authorised?" but specifically do they control it.

The non-DNS validation can't do that.

You are right. The EV validation process (I am EV validated at 3 major authorities) does involve sending paystubs, identification with photos and an interview.

Let's Encrypt is not a solution for trust. It is an attempt at getting as many people to adopt HTTPS as possible. There used to be 3-4 free certificate authorities which all had their problems (processes, security, uptime etc.) and Lets Encrypt is outperforming them all.

We still need to understand issues around identity, which can only be solved with verification and trust. X509 is encryption keys + trust and LE has much weaker trust guarantee than an EV.

What is the trust guarantee that an EV cert provides me?
Absolutely none. It's great for issuers as they get to charge a bunch more money to provide you with exactly zero extra security, which is why some of them try to pretend there's a purpose. There is not. Even the old (ridiculous) argument about user trust doesn't work anymore as browsers have no meaningful display difference these days between normal and EV certs.
I can totally understand your frustration. It is way too expensive for certificates and costs have gone off the rails.

Yes, browsers have removed the green trust bar.

Yes, ordinary users have to click on small buttons and manually check against different conventions used by CAs (naming, extensions, OID variants).

However, saying that EV provides no extra security not entirely true. At least if we look outside the end-users of a website.

It is also used for: - High security applications that have to ensure their services are trustworthy - As confidence/trust factors in cyber threat intelligence (if you don't want to get blocked on a false positive, EV is your friend) - In domain name research when trying to establish ownership - In machine learning models as an indicator of verifiable trust - Protects against website copying used in phishing campaigns

I'm focusing on HTTPS here as EV is much more relevant in PKI systems.

EV should be affordable, relevant, have good UX and provide identity security for end-users of browsers, but it is not. Until that changes, most website owners should not buy it.

How would you otherwise know that xoom.com is really PayPal?

I believe it is PayPal because DigiCert say it is (with EV). That is much better than no validation - which is the default.

Domains are not identities. They are a reference to an organization. PayPal owns more than 100 domains. Not only did DigiCert validate the organisation through various procedures (e-mail, paystub, id), but the validation is also asserted through a X500 name, which is cryptographically signed in the X509 certificate. So there is no way for others to spoof the identity.

Attackers can easily copy the HTML/CSS/JS the website to look and feel exactly like PayPal. Then they can go to Lets Encrypt and get a certificate, which offers no assertion on their identity, other than "Attackers own the domain paypal.xom.com" (a domain they purchased to spoof PayPal).

In that case, the EV certificate is the _only_ way you can check if it is really PayPal.

If you actually read the EV certificate, you might be able to reasonably tie what you learned to a DNS name, which is what really matters for the circumstances where you're using TLS, as we'll see in a minute:

The Certificate can tell you the identity of a legal entity which the CA made some reasonable attempt to verify wanted to set in stone the association between their identity and the DNS names listed (and also a key but don't worry about that). You should examine not only the name of that entity, but also the country or locality in which it claims to exist (this may be a tax haven) and its ID# in that country or locality's records, such as tax records, which may enable you to distinguish it from other entities with similar (or in some cases the same) names. The latter is in a certificate element labelled "serial number" but is the serial number of the subject entity not the certificate's serial number, which today is mostly a large random number of no importance (it is serving as a cryptographic nonce but you don't need to care about that)

Anyway, once you've carefully examined these details, and determined which entity you've got a certificate for, like I said the main value is that it tells you the DNS names associated.

Almost all the software tools you use, such as a web browser don't care about any of that stuff, but they do care about DNS names. So transactions e.g. following an HTTP redirect which are done silently and automatically by the browser, won't care that this is (or is not) a EV certificate at all, but they do check the DNS names on a certificate.

So if you're able to determine from the certificate that mybank.example really is run by My Bank Inc. the bank you've got money in, that's a valid use for EV, but your browser doesn't care whether the HTTPS server it talks to shows it that EV certificate (or any other EV certificate) during HTTPS transactions, only whether the DNS names are right. It would not for example, stop during a 30x redirect and say "Oh! This is a 30x redirect from mybank.example but it didn't present that EV certificate you checked, maybe it's an imposter?" that 30x redirect is fine, the certificate was fine, and you'll never see it at all, it will never be shown to you, your data was transmitted long before you had a chance to have an opinion anyway so who cares.

That "manual verification" back in the day was an email verification link, so again, back to relying on DNS. I don't think there's a way around it, but at least with certificate transparency we'll know if it happens.
It was always DNS. Unless you are getting EV the CAs usually verify ownership via email. Email can go anywhere the current MX record in DNS says it goes.
This is why for EVs most CAs also do phone validation.

For stuff like Verified Mark Certificates (which is used for BIMI), it goes much further than that. VMCs are like EVs on steroids.

HN crowd can sometimes react very hostile towards having to pay anything at all for certificates, but there are real costs in such validations.

As my Traefik setup is affected, I cleared the `acme.json` and let Traefik get new certificates for all services.

Seems LE is pretty busy right now, got time outs flying around every where.

My traefik setup is affected, should not be to difficult to refresh. It's automated anyway
Same here. Fixed by deleting the traefik pods. The underlying Deployment regenerated them and a new cert was obtained.
It may not refresh automatically before they revoke. I had luck manually renewing which is largely automated but not entirely.