| HN Mirror

My understanding of the best practices is that shelling into prod is a breakglass only. Developers need approval to get escalated permissions to shell into prod in the first place. Further, containers shouldn’t run as root (security) so I don’t know how you would install software anyway. Logs and metrics should similarly be queryable via some central log explorer service like CloudWatch, Splunk, Prometheus, or even kubectl+grep. You shouldn’t have to manually page through GBs of logs.

Our images are often pretty stripped down (coreutils at most, often just a Go binary and some certs), so there aren’t many debugging tools available.

This might make our time to resolution slightly higher, but it keeps our incident count quite a lot lower because we very rarely need to break glass in the first place (this means you have to establish norms for logging, instrumentation, and tests).