|
|
|
|
|
by throwaway894345
1847 days ago
|
|
My understanding of the best practices is that shelling into prod is a breakglass only. Developers need approval to get escalated permissions to shell into prod in the first place. Further, containers shouldn’t run as root (security) so I don’t know how you would install software anyway. Logs and metrics should similarly be queryable via some central log explorer service like CloudWatch, Splunk, Prometheus, or even kubectl+grep. You shouldn’t have to manually page through GBs of logs. Our images are often pretty stripped down (coreutils at most, often just a Go binary and some certs), so there aren’t many debugging tools available. This might make our time to resolution slightly higher, but it keeps our incident count quite a lot lower because we very rarely need to break glass in the first place (this means you have to establish norms for logging, instrumentation, and tests). |
|