Hacker News new | ask | show | jobs
by belthesar 776 days ago
I had much of the same issues early on in my Traefik experience. Things like using TLS-01 validation but not having DNS records set before config was applied would cause a lot of frustration. Like you, I was frustrated with the amount of logging I was getting. I eventually learned that not having DNS configured appropriately would lead validation attempts to fail after N unsuccessful attempts, and LE would refuse to do another TLS-01 validation for a while, which sounds like the kind of issue you were having.

After moving to DNS-01 validation, which comes with the added benefit of letting me cut certs for services that aren't publicly exposed with way less orchestration required than with TLS-01 style validation, my experience was suddenly much better. Assuming the DNS provider is working (and if it's not, you're hopefully getting an API error from them before LE attempts to validate the record, the failure state happens well before any check failure backoffs happen at LE. At this point, regardless of whether I'm using Traefik, Caddy, Nginx, or any other reverse proxy, I'm pretty committed to only using DNS-01 based validation from LetsEncrypt from now on, or if I have to do TLS-01 based validation, to make darn sure things are right the first time with the Staging API first.

Which, speaking of, if you cut a Staging cert with LE via Traefik, there's no good way to invalidate the staging cert. You have to munge the ACME JSON to remove the cert and restart Traefik (could maybe do a SIGHUP? didn't try) to get it to pickup the changes.

All said, lots of weird silent failures and behaviors, but the biggest pains are making dependent service errors opaque.