Hacker News new | ask | show | jobs
by tomgs 1360 days ago
Disclaimer: I run Developer Relations for Lightrun.

There is another way to tackle the problem for most normal, back-end applications: Dynamic Logging[0].

Instead of adding a large of amount of logs during development (and then having to deal with compressing and transforming them later) one can instead choose to only add the logs required at runtime.

This is a workflow shift, and as such should be handled with care. But for the majority of logs used for troubleshooting, it's actually a saner approach: Don't make a priori assumptions about what you might need in production, then try and "massage" the right parts out of it when the problem rears its head.

Instead, when facing an issue, add logs where and when you need them to almost "surgically" only get the bits you want. This way, logging cost reduction happens naturally - because you're never writing many of the logs to begin with.

Note: we're not talking about removing logs needed for compliance, forensics or other regulatory reasons here, of course. We're talking about those logs that are used by developers to better understand what's going on inside the application: the "print this variable" or "show this user's state" or "show me which path the execution took" type logs, the ones you look at once and then forget about (while their costs piles on and on).

We call this workflow "Dynamic Logging", and have a fully-featured version of the product available for use at the website with up to 3 live instances.

On a personal - albeit obviously biased - note, I was an SRE before I joined the company, and saw an early demo of the product. I remember uttering a very verbal f-word during the demonstration, and thinking that I want me one of these nice little IDE thingies this company makes. It's a different way to think about logging - I'll give you that - but it makes a world of sense to me.

[0] https://docs.lightrun.com/logs/

3 comments

Perhaps I’m misunderstanding but what happens if you’ve had a one-off production issue (job failed, etc) and you hadn’t dynamically logged the corresponding code? You can’t go back in time and enable logging for that failure right?
An alternative approach, IMHO, is to log all the things and just be judicious about expunging old stuff -- I believe the metrics community buys into this approach, too, storing high granularity captures for a week or whatever, and then rolling them up into larger aggregates for longer-term storage

I would also at least try a cluster-local log buffering system that forwards INFO and above as received, but buffers DEBUG and below, optionally allowing someone to uncork them if required, getting the "time traveling logging" you were describing. The risk, of course, is the more chains in that transmission flow the more opportunities for something to go sideways and take out all logs which would be :-(

That would entail time-travelling and capturing that exact spot in the code, which is usually done by exception monitoring/handling products (plenty exist on the market).

We're more after ongoing situations, where the issue is either hard to reproduce locally or requires very specific state - APIs returning wrong data, vague API 500 errors, application transactions issues, misbehaving caches, 3rd party library errors - that kind of stuff.

If you're looking at the app and your approach would normally be to add another hotfix with logging because some specific piece of information is missing, this approach works beautifully.

> which is usually done by exception monitoring/handling products (plenty exist on the market).

Only if one considers the bug/unexpected condition to be an exception; the only thing worse than nothing being an exception is everything being an exception

I'm glad someone put a name on the concept I've been advocating for a decade. Thank you! It's something we added at Netflix when we realized our logging costs were out of control.

We had a dashboard where you could flip on certain logging only as needed.

I know Mykyta, who does dev productivity at Netflix now, and he said something to that effect;)

I tried finding you on twitter but no go since DMs are closed.

Would be happy to pick your brain about the topic - tom@granot.dev is where I’m at if you have the time!

Sounds pretty cool. How much?
I don’t consider Free for one agent and “contact us” for everything else to be pricing.
logging is one of those things that gets complex in a hurry. if i was starting a log processing company i'd also do this, since pure bytes isn't a good metric. A really long error chain (like java gives) is "one error", but is that stored as a single "error" or multiple lines and parsed out later?

I really stopped paying attention to logging around logstash era for enterprise solutions, so my knowledge is woefully out of date, but based on how crazy my python services logfiles get for a single instance, i don't know that much has really changed.

Me neither. Immediately makes me disinterested in something that would otherwise be pretty cool.
And poof there goes my interest. No prices.