Hacker News new | ask | show | jobs
by thethimble 1362 days ago
Perhaps I’m misunderstanding but what happens if you’ve had a one-off production issue (job failed, etc) and you hadn’t dynamically logged the corresponding code? You can’t go back in time and enable logging for that failure right?
2 comments

An alternative approach, IMHO, is to log all the things and just be judicious about expunging old stuff -- I believe the metrics community buys into this approach, too, storing high granularity captures for a week or whatever, and then rolling them up into larger aggregates for longer-term storage

I would also at least try a cluster-local log buffering system that forwards INFO and above as received, but buffers DEBUG and below, optionally allowing someone to uncork them if required, getting the "time traveling logging" you were describing. The risk, of course, is the more chains in that transmission flow the more opportunities for something to go sideways and take out all logs which would be :-(

That would entail time-travelling and capturing that exact spot in the code, which is usually done by exception monitoring/handling products (plenty exist on the market).

We're more after ongoing situations, where the issue is either hard to reproduce locally or requires very specific state - APIs returning wrong data, vague API 500 errors, application transactions issues, misbehaving caches, 3rd party library errors - that kind of stuff.

If you're looking at the app and your approach would normally be to add another hotfix with logging because some specific piece of information is missing, this approach works beautifully.

> which is usually done by exception monitoring/handling products (plenty exist on the market).

Only if one considers the bug/unexpected condition to be an exception; the only thing worse than nothing being an exception is everything being an exception