Hacker News new | ask | show | jobs
Short-Lived Certificates at Netflix (medium.facilelogin.com)
102 points by prabaths 3167 days ago
9 comments

This is also implemented in OpenStack / for HPCloud in Anchor: https://wiki.openstack.org/wiki/Security/Projects/Anchor

You can use it as a standalone project in your environment as well. There's a talk about it at https://youtu.be/Q_ZhrQq-_YM (ideas very similar to Netflix)

dumb question but, but why was oscp stapling invented when it's the same as short lived certificates? have the certificate's expiration date set to a short period, have the CA renew it regularly, and place it on some http server. then you can have some cron job that downloads the certificate and reloads your server. and since the certificates are short lived, the CA/browser vendors can mark them as being excluded from OSCP checks. all the benefits of OSCP stapling, without the extra implementation complexity.
certificates costs money and the world was mostly manual in the past (and in a lot of places still is) you can still order a 3 year cert.
Also, many CA don’t provide programmatic way to order and to download cert. Another annoying manual step, especially consider having multi-domains not convered under a single wildcard.

Sadly let’s encrypt does not support wildcard... and support for intranet is controversial to thosr who want to keep their intranet “totally private”. I love the autobot. One command to generate new cert, another command to reload my Nginx.

> One command to generate new cert, another command to reload my Nginx.

Maybe this is what you meant but you can do it in a single command. For example:

    certbot renew --post-hook "service nginx reload"
that works fine, until it doesn't... multiple servers
Let's Encrypt will support wildcards in January 2018.
The issue we've run into with Let's Encrypt, is that they have a limit on the number of new (non-renewal) certs in a time period, grouped by high-level domain. So, for example, when you have lots of separate groups running sites until a common domain (groupa.example.com, groupb.example.com), you often hit the new-issuance limit, and have to wait.

So, I expect existing CAs (and groups like InCommon) will continue to be around to serve large entities.

I don't know how helpful we can be, but we (and other companies like ours) have much higher Let's Encrypt limits. If it'd make your life better send me an email and we'll help you out.
Wildcard certs should help with that. Also, the limit is only on certs, not on domains. If your process allows for issuing a single cert for 100 domains, that also solves the problem. (There is a limit on SANs per cert AFAIK, but I don't have the exact number.)
Finally! Now they just need to add support for specifying cert length from 10-360 days.
They won’t ever offer > 90 days, but certificate lengths of about a day would certainly be interesting. I’d certainly switch to 24h valid certificates as soon as possible.

Ideally even separate certs for every subdomain, but as Let’s Encrypt has cert limits, and I want to avoid SNI in the future, I’ll probably have to use wildcard certs.

>certificates costs money

you still pay for x years certificate, but you only get a valid certificate for the next y days. if the CA can sign OSCP responses for millions of visitors, surely they can resign a certificate every y days.

>the world was mostly manual in the past (and in a lot of places still is)

that makes sense when talking about OSCP, but to get OSCP stapling working, you need to configure your web server to do so. instead of standardizing OSCP stapling, why couldn't they have standardized a protocol for a server to get updated certificates from the CA?

> why couldn't they have standardized a protocol for a server to get updated certificates from the CA?

That's essentially OCSP :-)

In general there's not really a concept of an "updated certificate". A certificate is good if the signature matches, the subject on the cert matches the server, and the current time is within the certificate's validity period; otherwise, the cert is bad. If someone steals a certificate, the website fixes this by telling the C.A. to revoke the certificate and by serving a different valid cert – my old employer kept a second valid certificate lying around in order to minimize downtime in the event of having to kill the main cert. OCSP is a way to effect the killing of the bad certificate. But in the absence of OCSP or some other revocation mechanism both certs are still good.

> if the CA can sign OSCP responses for millions of visitors, surely they can resign a certificate every y days.

I haven't worked on this stuff in a few years, but historically OCSP resolvers were notoriously unreliable and often down. That's a huge issue for a security-critical path, because you're forced to "fail open" (which defeats the whole point of having OCSP in the first place) or render a wretched experience for your user. One big reason OCSP stapling exists is to work around C.A. OCSP resolvers' unreliability.

Does anyone know if ephemeral/automated cert issuing and renewal exists as an open source project yet? Most of this is Netflix internal but I feel like letsencrypt has made short lived certs an inevitibility
Somewhat related: Vault has a PKI backend that can help facilitate this. You'll need to create some tooling around it, but we've had great success rolling it out at my company.

https://vaultproject.io

LetsEncrypt provide two reference implementations of an ACME server, in Pebble[0] (not production ready) and Boulder[1]

[0]: https://github.com/letsencrypt/pebble

[1]: https://github.com/letsencrypt/boulder

It's part of the Credhub vision to do so[0] (supporting "The 3 Rs", being rotate, repair, repave[1]). Pivotal has been sponsoring development.

I was on the Credhub team for a while. When you begin to assume that you have (1) an always-on credentials service and (2) that it can serve multiple sides of credentialling (eg, service broker adds a credential, application fetches it), you get to do more aggressive cred management.

I was on the Credhub team for about 6 months, while it was being worked on both US coasts. It's now based in NYC.

[0] https://github.com/cloudfoundry-incubator/credhub/tree/maste...

[1] https://builttoadapt.io/the-three-r-s-of-enterprise-security...

I am looking forward to ACME 2.0 becoming an RFC! Once that happens, I can ping our ACS team to start bugging InCommon to spin up the appropriate server components. My guess is the other CAs are waiting for ACME 2.0 before they really spin up support for it.

For anyone interested in tracking the progress of ACME 2.0, take a look here: https://datatracker.ietf.org/doc/draft-ietf-acme-acme/

I don't know much about ACME 2.0 (except that it is apparently necessary for LE to be able to start offering wildcard certs). Can you expand on why CAs are waiting for 2.0?
Probably because it's the first IETF-approved spec for the protocol; right now it's all in draft status. LE itself will be adding ACMEv2 next year.
Why is certificate management not integrated with DNS? You already have to consult DNS to get an address to connect to, so why not piggyback certificate validity information on top of that? I'd suggest allowing both revocation lists and a way to say that only a specific list of certificates is allowed.
There are some things like that (e.g. DANE), but in the general case you can not trust DNS, since it isn't authenticated. (DNSSEC is far from everywhere, even if the resolver does DNSSEC the connection to the resolver might be unprotected)
How does Netflix handle applications that don't have certificate/key hot reload capability? MySQL is especially guilty of this. It's a PITA to force a restart just to reload certs or keys even once a year. I can't imagine having to do this every few days.
Companies should start using tools like ManageEngine Key Manage Plus https://www.manageengine.com/key-manager/ (or) other similar products for secure ssh key and ssl certificate management. Automation is the only way to avoid security issues.

Disclaimer: *I work for ManageEngine.

Symantec has developed an Open Source system for short-term SSH and SSL certificate management with 2FA (VIP and U2F). We encourage people to adopt this to improve their security. Code: https://github.com/Symantec/keymaster Design document: https://docs.google.com/document/d/1AW3UROCJqTc3R4MLJXxmPUNS...
the article mentions, that HAProxy cannot do a reload without downtime ? is that correct ? no way around that ?
Was true for a long time, and still is for the current stable version. This article goes in more than enough detail over the history and the various workarounds: https://www.haproxy.com/blog/truly-seamless-reloads-with-hap...
and short-lived tenures ;)