Hacker News new | ask | show | jobs
by jeffbee 2149 days ago
Ad-hoc production troubleshooting is a reason to keep, at most, 7 days of logs. Usually you want the most recent minute or hour. Troubleshooting usually does not need collection, aggregation, and indexing because either the problem is isolated to a host or the logs of a single host, pod, or process are representative of what is happening in the rest of the fleet. Even if you want to access all logs, it's still better to leave them where they were produced and push a predicate out to every host; your log-producing fleet has far, far more compute resources than your poor little central database, no matter how big that DB is.
3 comments

What a bunch of odd and arbitrary statements. Examples: I often use logs older than 7 days for troubleshooting. I rarely troubleshoot only using last minute or hour data. I need aggregation most of the time when troubleshooting. I also treat most runtime environments as cattle so relying on it to keep logs locally would be wrong.
All but trivially reproducible bug reports require, or benefit immensely from, logs about the transaction in question. The pipeline from support to product to engineering to an actual investigation is usually much more than 7 days.
May I ask what kind of production environments you have in mind? Are these large-scale FAANG-style deployments or something else?
Well, only to the extent that the management of small amounts of logs is not very interesting. There is not an ACM SIG for very small databases.

Anyway, GPDR requires you to have a purpose for any log that contains any IP address. Keeping logs for undefined purposes and unlimited time frames is not ok any more.

It could be an e-commerce site, which depending on the case can produce a shitload of logs. Imagine having a few hundred of thousands of users daily and you record every page they view, along with heatmaps, and whatnot. In most cases those logs should never be touched by a human in raw format. You feed them in your analytics engine, and start making decisions about your conversion. And then you delete them because the goalpost is constantly moving.