| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dlisboa 796 days ago

> Keeping all of your application logs and telemetry forever is expensive, and I can't recall a single time when having more than a day's with of history was ever useful in tracking down an operational issue.

A day is a pretty small window, I'd say a week or a bit more is good enough for most orgs. That way you can compare specific endpoints/code between deploys, answering questions like "was this endpoint this slow last week too or did I break it?". Some issues take a few days to brew and having historical data is important in debugging. Many orgs don't do load testing at all or have any real performance analysis done before things crash.

Log retention is also directly tied to how fast and easily can you detect and recover from issues.

1 comments

jedberg 796 days ago

> Log retention is also directly tied to how fast and easily can you detect and recover from issues.

I disagree. Every issue I've ever debugged, I did a tail -f on the logs. I can't recall ever searching the old logs.

Even if it takes a few days for an issue to brew, usually the logs right now will show the issue. Or if they don't, then you can turn on the logs and have them in a few days time. It's so rare that it's almost never worth keeping the logs around just for that one case where an old log might lead to resolution, and rarely does one have time during an active incident to look at old logs anyway.