|
|
|
|
|
by cyberpunk
1847 days ago
|
|
Did we? I mean in prod I still frequently pop a shell in a pod, apk install some tool and reproduce issues outside of the app, sure I could trawl through gbs of istio logs or whatever but it would probably at least double incident resolution if I had no userland available on the machine with the problem… |
|
Our images are often pretty stripped down (coreutils at most, often just a Go binary and some certs), so there aren’t many debugging tools available.
This might make our time to resolution slightly higher, but it keeps our incident count quite a lot lower because we very rarely need to break glass in the first place (this means you have to establish norms for logging, instrumentation, and tests).