Hacker News new | ask | show | jobs
New TLS certificate for .herokuapp.com hostnames (devcenter.heroku.com)
89 points by paulfurley 2197 days ago
We run a backend API app on Heroku and for simplicity our frontend calls it via the herokuapp.com subdomain `<our-app-name>.herokuapp.com`.

We haven't bothered with a custom domain SSL certificate as the herokuapp.com subdomain has been just fine.

Fortunately I was monitoring the endpoint as I started getting SSL expiry warnings a few weeks ago.

It seems heroku is serving an old certificate for <our-app-name>.herokuapp.com, issued April 2019 and expiring 22nd June:

``` $ curl -v --head https://<our-app-name>.herokuapp.com/ * Connected to <our-app-name>.herokuapp.com (52.19.225.66) port 443 (#0) [snip] * Server certificate: * expire date: Jun 22 12:00:00 2020 GMT * subjectAltName: host "<our-app-name>.herokuapp.com" matched cert's ".herokuapp.com" issuer: C=US; O=DigiCert Inc; OU=www.digicert.com; CN=DigiCert SHA2 High Assurance Server CA ```

It's a wildcard cert for .herokuapp.com but it's different from the current one I see if I curl the root domain:

``` $ curl -v --head https://herokuapp.com/ Connected to herokuapp.com (34.194.84.166) port 443 (#0) * Server certificate: * expire date: Aug 2 02:13:11 2020 GMT * issuer: C=US; O=Let's Encrypt; CN=Let's Encrypt Authority X3

```

It seems they've transitioned to Let's Encrypt for the wildcard domain, but it isn't being served for app subdomains. I've checked a few other subdomains and see the same thing:

* govuk-prototype-kit.herokuapp.com * heroku-status.herouapp.com * juice-shop.herokuapp.com

I've been raising this with support since T-30 and they just keep saying things like:

> Our concerned team is aware of it and they are actively working on the renewal process. We'll get the new cert in there well before the expiration, and there will be no disruption of service.

Now we're at 7 days I've lost confidence that support has even forwarded my ticket to the right team.

I suspect in 7 days we're gonna see a lot of things break...

6 comments

Something that's nice about Let's Encrypt is that it forces you to change something every few months. After the first couple months, you'll probably get your issues worked out. If you just change certs every few years, then every few years you have some sort of disaster because of the "well we fixed it, we don't have to worry for two years" effect.

A broader lesson is the importance of "trying out" rare events, even before that rare event actually happens. If depends on a service with a certain SLA, it's pretty dangerous when that service has 100% uptime. You never get to see what happens when it does go down, which it did promise you it will. Some people track their error budget, and at the end of the accounting period, break their service in accordance with the SLA. Then you get to see what happens when it does go down. (Ref: https://queue.acm.org/detail.cfm?id=2371516)

Although, speaking of Let's Encrypt, there will be a series of disruptive events over the next 18 months or so.

* Soon (although when exactly I'm not sure because it has been delayed at least once) the Let's Encrypt systems will tell compliant ACME clients that the "correct" intermediate is Let's Encrypt's ISRG-signed X3 intermediate. This is a different certificate for the same X3 private key you're used to but not signed by the same trust root. If you use a correct client and have done things properly, this may cut off TLS clients for your systems that don't trust ISRG (the charity which runs Let's Encrypt). Six year old Android phones, the Windows XP system you know should have been retired, a VoIP desk phone running out-of-date firmware, stuff like that.

* In March 2021 the X3 Intermediate expires. If your certificate software was not compliant with ACME, or you manually overrode it to use the old certificates to avoid the problem in the previous item, things break now. More things, and worse. Although...

* Maybe before March 2021 the Let's Encrypt systems stop issuing from those soon-to-be-obsolete Intermediates and use newer ones instead perhaps named Y3 and Y4. In this case if you've jury rigged things (in an ACME non-compliant way) to keep using the old X3 intermediate that'll break suddenly after your renewal. Common web browsers may not trust the nonsense you're emitting, exactly which browsers break may vary depending on exactly what stupid things you did, but chances are you haven't tested and don't know. If you are using a compliant client then modern browsers are all fine, but archaic stuff breaks suddenly.

* In September 2021 the DST Root X3 root expires. If you have somehow clung on to trust via this root, whether through your own effort or via trust path discovery code inside client systems, that goes away instantly. Any systems that don't trust ISRG will refuse to trust your certificates, no matter how often you re-issue them and reconfigure things, those clients themselves need updating urgently and you probably have no way to do that. Oops.

That sounds like just one maybe-disruptive event that manifests itself differently if you keep working around it instead of dealing with it properly. If you need to deal with it at all - I suspect most systems that still need to connect to the internet trust the ISRG root nowadays.
> I suspect most systems that still need to connect to the internet trust the ISRG root nowadays.

There are tons of systems that do not -- particularly in the enterprise. I manage web servers for a mission-critical healthcare-related SaaS. We occasionally encounter TLS issues even with Globalsign root certificates -- far more distributed than ISRG.

We ended up switching to DigiCert last year and it helped reduce the number of TLS-related failures reported to us.

We could never switch to Let's Encrypt / ISRG for that reason. Even if ISRG has 95% distribution of their root certificate, that's not good enough for mission-critical enterprise.

I'm not at all surprised that Heroku had to roll back their TLS certificate back to DigiCert -- DigiCert is what you want if need compatibility with the highest number of clients.

This inspired me to look into my system's trusted roots. Here's the root CA expirations coming up in the next 18 months. The last one on this list really hits home, as anyone who did TLS back in the early 00's may remember.

2020-09-12 - DST Root CA X4

2021-03-17 - QuoVadis Root Certification Authority

2021-04-06 - Sonera Class X2

2021-09-30 - DST Root CA X3

2021-11-09 - Admin-Root-CA

2021-12-15 - Belgium Root CA2

2021-12-15 - GlobalSign

Admin-Root-CA shows us how far we've come, I think today that even if Mozilla's root programme didn't forbid them people would guess that ultra-vague names aren't a good idea.

For reference that is the Swiss government's root and it isn't trusted by Mozilla so as a consequence it's unlikely that any systems you have facing ordinary web browsers depend on this root to be trusted.

It's also funny to go back and look at Mozilla's trust decision (it's before I was engaged in looking at this on a day-to-day basis) and see that the terrible naming was decisive while the practice of just basically trusting a Swiss government employee to issue whatever they want was considered only "problematic" and not necessarily a showstopper.

Of course because Mozilla doesn't trust this root, it does not see itself as having any oversight role for the root. So if you use MacOS, or Windows, to do anything other than run Firefox, you're reliant on their teams to verify that this root is well run. Maybe they're doing a great job? I guess you'd only ever find out the hard way because they operate entirely behind closed doors.

Certificate Transparency logs would provide the answer to this.
I think "Six year old Android phones" is optimistic. As I understand it the ISRG root was added in Android 7.1, and 7.0 was released in August 2016.

So it might be more like "Three year old Android phones", given the lag between upstream releases and adoption.

https://news.ycombinator.com/item?id=23496332

They postponed but anyway they planned to drop old Android support in 2020 but I doubt it's possible in near future.

Security means you don’t support

> Six year old Android phones, the Windows XP system you know should have been retired, a VoIP desk phone running out-of-date firmware, stuff like that.

If these can’t connect that is a feature, not a bug.

The downside is that due to a lack of serious competition, Let's Encrypt seems like an obvious choice, and thus it can be tempting to hardcode it.

I have a homebrew Internet-of-shit device that I know has LE hardcoded. I'll have to take it of the wall and reflash if I switch to a new CA (or potentially when some of the changes described by tialaramex happen - I think I hardcoded the new root but I'm not 100% sure).

The acme protocol is well defined , and code is open source you could always implement your own service.

Let’s encrypt only real hold is their root certificate is now in many trust stores , if you control both sides self signed certificates are perfectly fine you don’t need a CA at all

I think he's talking about the temptation to set up a pin to their root. That can break just as easily as any other pin, and of course you won't be prepared.
Why would you pin a certificate that you did not generate for a domain that's not yours?
Most people probably didn't pin the certificate. I think that the problem was caused by developers configuring their application to trust only the DigiCert Root CA.

This usually happens beacuse some applications don't use by default the root CA bundle of the underlying OS to authenticate TLS connections, but require you to put each Root CA certificate in a trust store (ex. Java).

Some devs probably added just the Digicert root CA and forgot about it.

These kind of changes with certificate are always kind of tricky, because they usually work very reliably until they don't.

My guess is that it is some larger corporate client with a middleware app they pinned the cert into. An app they built on Heroku hurriedly because it was fast and cheap to get started and they didn’t expect to need to scale. Then as can happen it became important and they scaled anyway. They probably lost the talent that built it so they don’t even remember how it works.

All of this is assumption based on how they seem to buy enough compute at Heroku to have sway over rolling back this change of cert provider.

Ugh, yes, I assure you that some corporate clients will even try to pin the actual leaf certificate; pinning an intermediate or root is almost good behavior for them. (Honestly, the number of times I had to tell our support people that no, we would not support customers trying to pin our AWS-issued certificates, and no, I couldn't promise to notify them even if I wanted to since AWS could just rotate them at will...)
Once upon a time a UK model train manufacturer had a warning inside the box of your new loco or rolling stock that said something like "Do not dismantle out of idle curiosity".

Some people despite the warnings and consequences can't help themselves :)

What a clever way to make more sales.

1) people are dismantling goods, then ringing for advice but being embarrassed they broke it, then rebuying when we refuse advice,

2) tell people not to dismantle it, they'll get curious and do the opposite,

3) ...

4) pro-fit!

Heroku also doesn't enforce any verification that you own a domain name. Another user can simply add any domain they like to their app if you haven't claimed it by adding it to your app first. Regardless of ownership and you will no longer be able to add your own domain to your app getting a generic "domain is already in use" error. Happened to me a few years ago, had to reach out to support and prove I owned the domain. They made me verify I owned it and fixed it but said theres nothing they can do going forward. Granted it's a total edge case but was still an unnexpected experience, maybe it's fixed now who knows.

Edit: Looks like this is fairly common on PAAS so my original comment isn't that relevant.

As long as support can fix this, it isn't really a problem is it? If you point your domain to heroku having not set it up first, that's on you...
You would think so, but it turns out that this ("subdomain takeovers") is a very common mis-configuration for a lot of *aaS services. Enough so that some bug bounty programs won't pay out much for it or at all because it happens so much and they don't consider the shared-suffix issue very important.

On the provider side though, requiring ownership verification (txt records, etc) introduces friction on the on-boarding process. It's likely any reasonably competent provider that doesn't implement verification has had a serious internal discussion about it and decided it's not worth thinning their funnel for.

Yeah like I said in another comment it seems like it's normal across these paas platforms.

> If you point your domain to heroku having not set it up first, that's on you...

How is it on you? it's possible a previous owner had a heroku app with that domain attached to it or someone just added it to their app before you setup a heroku app.

How does GitHub’s gh pages handle this? I don’t remember them doing anything either.
They don't. I've lost ability to deploy GitHub pages to domains I own because of this, when a repo went out of my control with CNAME set... Now I cannot change the CNAME there, and cannot verify a new repo with that name.

So far GitLab seems to be the best one I've run into, they do validation, and as long as you keep control of the domain, you keep control of your pages.

Pretty sure you have to set some DNS records for gh pages
Yes you have to point a record at the gh pages if you want to use dns with it, but I don’t think their server checks for that.
Is it possible to subscribe to a firehose of .COM NS record updates through one of those fancy things like dnsdb? If so, perhaps there's an opportunity here for exploiting that race condition en-masse for services that support direct NS-style delegation, like netlify.
I actually don't know as I haven't used it in a while. But I did just test netlify and the same issue exists there, domains need to be unique, perhaps there isn't a nice way of dealing with that edge case.
It already broke a few days ago: https://status.heroku.com/incidents/2045
Yep, we were down during that time. I don't know what Heroku is doing with this certificate thing, but seems like a mess.
If you're pinning the SSL cert then it's not something they can fix, it's something you need to change on your end.
We went down. We're not pinning the SSL cert. We're behind CloudFront, and CloudFront didn't trust the new cert and stopped forwarding traffic.
Wow. I really hope they’ll get it done before the expiration date but I always thought they’d be renewing months in advance at minimum. Are they trying to negotiate something?
Get what done?

Heroku replaced their wildcard certificate in plenty of time but many customers do not anticipate anything changing ever and will fiercely resist this simple fact, so for those customers stuff blew up.

Remember the Y2K problem is the result of software written not in 1901 or even just 1981 but even well into the 1990s with the calm certainty that all years begin 19xx.

Heroku can try brown out policies, but this sort of customer intransigence is very difficult to defeat. The customer is quite certain they're right, who ever heard of "change" anyway? Everyone knows that the world is a flat plane, fixed in space, eternal and unchanging, this pinning rule I wrote in 2019 worked then, therefore it is still correct now.

This is called ‘enterprise’ and everyone in it will tell you a million excuses why it’s the only way.
Most likely.

I'm pretty sure the money they want is something ridiculous because in their business wisdom they know Heroku has no power if they don't want to rock the boat.

The question is: Is Heroku willing to rock the boat?

Let's Encrypt issues wildcard certificates, so Heroku could easily pay $0 for one. I don't think they are negotiating anything.
The issue is that some of their customers have pinned DigiCert so they have two choices:

* pay whatever DigiCert demands for a new certificate * accept that some of their customers will break

Doing this two weeks before the old certificate expires puts them in a difficult situation for negotiating, especially now they've committed to getting a new DigiCert certificate.

It seems like the solution to this is to implement both but charge customers to use the DigiCert chain. "Oh, you went and pinned something that you shouldn't have? That's fine; you can either fix it yourself or pay us to support your mistake."
There's more context in this Heroku Changelog https://devcenter.heroku.com/changelog-items/1813

"On Tuesday June 9th 2020 Heroku changed the certificate used for terminating TLS for built-in <appname>.herokuapp.com hostnames from a certificate issued by DigiCert to one issued by Starfield/AWS. This change was rolled back on June 10th because a small subset of Heroku customers had pinned apps to the DigiCert certificate or had apps that could not establish a chain of trust with the new certificate for other reasons.

A new DigiCert-signed certificate will replace the current one before June 22nd (when it expires)."