Hacker News new | ask | show | jobs
by throwaway894345 1847 days ago
We already solved those problems with containers (I know, half of HN doesn’t like containers and everyone should manage heavy VMs or bare metal machines just like our ancestors did). Notably logs and metrics are exfiltrated, detailed logging and monitoring largely obsolete debuggers for production (indeed, they aren’t even balle into the final image nor are they installable since the app user oughtn’t be root). These practices seem pretty portable to the unikernel model and they don’t require any hand-rolled workarounds or thousands of machines of scale.
1 comments

Did we? I mean in prod I still frequently pop a shell in a pod, apk install some tool and reproduce issues outside of the app, sure I could trawl through gbs of istio logs or whatever but it would probably at least double incident resolution if I had no userland available on the machine with the problem…
My understanding of the best practices is that shelling into prod is a breakglass only. Developers need approval to get escalated permissions to shell into prod in the first place. Further, containers shouldn’t run as root (security) so I don’t know how you would install software anyway. Logs and metrics should similarly be queryable via some central log explorer service like CloudWatch, Splunk, Prometheus, or even kubectl+grep. You shouldn’t have to manually page through GBs of logs.

Our images are often pretty stripped down (coreutils at most, often just a Go binary and some certs), so there aren’t many debugging tools available.

This might make our time to resolution slightly higher, but it keeps our incident count quite a lot lower because we very rarely need to break glass in the first place (this means you have to establish norms for logging, instrumentation, and tests).

Curl to /dev/shm/ ;)