Hacker News new | ask | show | jobs
by jamiesonbecker 71 days ago
But then you can't log in if your box goes offline for any reason.
1 comments

Hmm. For user certs you can have the service sign them for, say an hour, so long as you can ssh to your server in that time then there’s no need for any other interaction.

Sure you need your signing service to be reasonably available, but that’s easily accomplished.

Maybe I misunderstand?

That works for authn in the happy path: short-lived cert, grab it, connect, done.

Except for everything around that:

* user lifecycle (create/remove/rename accounts)

* authz (who gets sudo, what groups, per-host differences)

* cleanup (what happens when someone leaves)

* visibility (what state is this box actually in right now?)

SSH certs don’t really touch any of that. They answer can this key log in right now, not what should exist on this machine.

So in practice, something else ends up managing users, groups, sudoers, home dirs, etc. Now there are two systems that both have to be correct.

On the availability point: "reasonably available" is doing a lot of work ;)

Even with 1-hour certs:

* new sessions depend on the signer

* fleet-wide issues hit everything at once

* incident response gets awkward if the signer is part of the blast radius

The failure mode shifts from a few boxes don't work to nobody can get in anywhere

The pull model just leans the other way:

* nodes converge to desired state

* access continues even if control plane hiccups

* authn and authz live together on the box

Both models can work - it’s more about which failure mode is tolerable to you.

Well, yes, pick your poison.

But for just getting access to role accounts then I find it a lot nicer than distributing public keys around.

And for everything else, a periodic Ansible :-)

Public keys (for OpenSSH) can be in DNS (VerifyHostKeyDNS) or in, say, LDAP via KnownHostsCommand and AuthorizedKeysCommand.
That sounds like a lot of extra steps. How do I validate the authenticity of a signing request? Should my signing machine be able to challenge the requester? (This means that the CA key is on a machine with network access!!)

Replacing the distribution of a revocation list with short-lived certificates just creates other problems that are not easier to solve. (Also, 1h is bonkers, even letsencrypt doesn't do it)

1h is bonkers for certs in https, but it's not unreasonable for authorized user certs, if your issuance path is available enough.

IMHO, if you're pushing revocation lists at low latency, you could also push authorized keys updates at low latency.

Honestly, we used to replace a lot of pam_ldap and similar sorts of awful solutions. With those, if your LDAP went down even for a heartbeat, you couldn't log in at all.

So I totally agree: if I had to do certificates and didn't have something like Userify, a 1 hour (or even shorter if possible) expiration seems quite worth chasing, especially with suitable highly available configuration. (Of course, TFA doesn't even bother mentioning revocation and expiration, which should give you a clue as to how much fun those are lol)

And for more normal, lower-security requirements or non-HA, 6 or 8 hours or so would probably work and give you plenty of time for even serious system outages before the certs expired.

Not to hard shill or anything (apologies in advance, just skip if you're not interested), but there are two significant security and reliability differences between standard SSH (with or without certificates) and Userify:

1. Userify Cloud updates by default every three minutes, and on-premise Userify Express/Enterprise updates every ten seconds, but it doesn't have to update at all; even if your Userify server goes offline forever, you can still log in because the accounts are standard UNIX accounts (literally created with `useradd`)

2. When accounts are removed, Userify also completely nukes the user account, removes its sudo perms, and totally kill -9 's any tmux/screen/etc sessions (all processes owned by the user are terminated across the entire enterprise within seconds), which is also not something that a certificate expiration would ever do.