Hacker News new | ask | show | jobs
by agentgt 3757 days ago
The issue I have with that is the tooling you mention while stable and mature is actively being replaced by cloud tools because you really can't just debug a single machine in production when you have a cluster.. not to mention it is production so debug symbols might not even be available.

I understand your point of the maturity w/ tooling but I see it as a serious failure if you have to log into a machine in production and run gdb or IMO any tool. Your app can and should provide healthchecks/monitoring so that you can see if there is a problem (this includes even a thread stack dump).

I'm probably just biased and jaded as I have had some serious technical debt lost to Docker. It just feels like a VM on top of a VM on top of a VM of continuous things to break/learn... I want baremetal :)

1 comments

> you really can't just debug a single machine in production when you have a cluster

Somehow I ended up debugging, tracing, monitoring and even hotpatching individual machines in the cluster. Yeah the easy problems will show up in the monitoring and logs. The harder ones won't.

That must have been a pain in the butt :) . And for sure your right there are always exceptions.

I guess I haven't ran into those issues probably because I run JVMs but I suppose if you have native code or an interpreter using native code I can see how it would be helpful to just SSH and figure out what the issue is.

Now that I recall I have actually had to SSH a bunch of times because of Rackspace network interfaces randomly failing so I am a big hypocrite :)

It wasn't a huge pain. It was Erlang, so I could do those very easily. But it still had to be done by logging into a few machines and poking around. I can't imagine if that was somehow a bunch of C code combined with a kernel running in the same memory space.