Hacker News new | ask | show | jobs
by dextercd 168 days ago
You need external monitoring of certificate validity. Your ACME client might not be sending failure notifications properly (like happened to Bazel here). The client could also think everything is OK because it acquired a new cert, meanwhile the certificate isn't installed properly (e.g., not reloading a service so it keeps using the old cert).

I have a simple Python script that runs every day and checks the certificates of multiple sites.

One time this script signaled that a cert was close to expiring even though I saw a newer cert in my browser. It turned out that I had accidentally launched another reverse proxy instance which was stuck on the old cert. Requests were randomly passed to either instance. The script helped me correct this mistake before it caused issues.

4 comments

100%, I've run into this too. I wrote some minimal scripts in Bash, Python, Ruby, Node.js (JavaScript), Go, and Powershell to send a request and alert if the expiration is less than 14 days from now: https://heyoncall.com/blog/barebone-scripts-to-check-ssl-cer... because anyone who's operating a TLS-secured website (which is... basically anyone with a website) should have at least that level of automated sanity check. We're talking about ~10 lines of Python!
There is a Prometheus plugin called ssl_exporter that will provide the ability for Grafana to display a dashboard of all of your certs and their expirations. But, the trick is that you need to know where all your certs are located. We were using Venafi to do auto discovery but a simple script to basically nmap your network provides the same functionality.
Blackbox exporter will do same thing while testing HTTP and others.
relevant certificates could be located by scanning the certificate transparency logs
What you're monitoring is "Did my system request a renewed cert?" but what most people's customers care about is instead, "Did our HTTPS endpoint use an in-date certificate?"

For example say you've got an internal test endpoint, two US endpoints and a rest-of-world endpoint, physically located in four places. Maybe your renewal process works with a month left - but the code to replace working certificates in a running instance is bugged. So, maybe Monday that renewal happens, your "CT log monitor" approach is green, but nobody gets new certs.

On Wednesday engineers ship a new test release to the test endpoint, restarting and thus grabbing the renewed cert, for them everything seems great. Then on Friday afternoon a weird glitch happens for some US customers, restarting both US servers seems to fix the glitch and now US customers also see a renewed cert. But a month later the Asian customers complain everything is broken - because their endpoint is still using the old certificate.

> Did our HTTPS endpoint use an in-date certificate?

For any non-trivial organization, you want to know when client certificates expire too.

In my experience, the easiest way is to export anything that remotely looks like a certificate to the monitoring system, and let people exclude the false positives. Of course, that requires you to have a monitoring system in the first place. That is no longer a given.

So, I've worked for both startups and large entities, including both an international corporation and a major university, and in all that time I've worked with exactly one system that used client TLS certificates. They mostly weren't from the Web PKI (and so none of these technologies are relevant, Let's Encrypt for example has announced and maybe even implemented choices to explicitly not issue client certs) and they were handled by a handful of people who I'd say were... not experts.

It's true that you could use client certs with say, Entra ID, and one day I will work somewhere that does that. Or maybe I won't, I'm an old man and "We should use client certs" is an ambition I've heard from management several times but never seen enacted, so the renaming of Azure AD to Entra ID doesn't seem likely to change that.

Once you're not using the Web PKI cert expiry lifetimes are much more purpose specific. It might well make sense for your Entra ID apps to have 10 year certs because eh, if you need to kill a cert you can explicitly do that, it's not a vast global system where only expiry is realistically useful. If you're minting your own ten year certs, now expiry alerting is a very small part of your risk profile.

Client certificates aren't as esoteric as you think. They're not always used for web authentication, but many enterprises use them for WiFi/LAN authentication (EAP-TLS) and securing confidential APIs. Shops that run Kubernetes use mTLS for securing pod to pod traffic, etc. I've also seen them used for VPN authentication.
sure, I was just giving parent another way of finding all the certificates besides scanning the network
I am airgapped and the certs are usually wildcard with multiple SANs. You would think that the SANs alone would tell you which host has a cert. But, it can be difficult to find all the hosts or even internal hosts that use TLS.
> You need external monitoring of certificate validity.

Plug for Uptime Kuma, they support notifications ahead of expiry: https://github.com/louislam/uptime-kuma

Kind of cool to have an uptime monitoring tool that also had an option like that, two birds one stone and all that. Not affiliated with them, FOSS project.

The scalable way (up to thousands of certificates) is https://sslboard.com. Give it one apex domain, it will find all your in-use certificates, then set alerts (email or webhook). Fully external monitoring and inventory.
Looks like it relies on certificate transparency logs. That means that it won’t be monitor endpoints using wildcard certs. Best thing it could do would be to alert when a wildcard cert is expiring without a renewed cert having been issued.
Is that enough though? You may have wildcards on domains that are not even on a public DNS and you may forget to replace it "somewhere". For that reason it is better to either dump list of domains from your local DNS or have e.g. zabbix or another agent on every host machine checking that file for you.
That's exactly my point. Is that while this service sounds quite useful for many common cases, it's going to fail in cases where there's not a 1-to-1 certificate-to-server mapping. Even outside of wildcards, you have to account for cases where the cert might be installed on N number of load balancers.
If you're using a cert on multiple IPs, or IPv4+v6, SSLBoard will monitor all IPs. It's not foolproof, but it covers most common practices. btw wildcard certs don't have a good reputation (blast radius)...
I'd say that load balancers (one-address-to-N-servers) count as a common practice, but I otherwise agree in that regard.

Regarding wildcard certs, eh. I wouldn't say they have a bad reputation. Sure, greater blast radius. But sometimes it can certainly simplify things to use one. Your ACME client configuration is easier and your TLS terminator configuration often becomes easier when the terminator would otherwise need to switch based on SNI.

Indeed, SSLBoard is scanning CT logs. You can add/import host names though, to allow monitoring of wildcard certs. Same if you're using ports that are not 443, you have to add these to the list of hostnames that are checked.

It's not as convenient, but it's the best SSLBoard can do...