Hacker News new | ask | show | jobs
by j88439h84 2146 days ago
What is the use case for logs?
1 comments

There isn't a universal one. If you don't have a concrete one in mind, you shouldn't produce the log at all.
I appreciate the zen-like nature of this advice, but I think you also know how unreasonable it is most of the time, unless by 'concrete' you allow something as vague as, "troubleshoot production issues".
Ad-hoc production troubleshooting is a reason to keep, at most, 7 days of logs. Usually you want the most recent minute or hour. Troubleshooting usually does not need collection, aggregation, and indexing because either the problem is isolated to a host or the logs of a single host, pod, or process are representative of what is happening in the rest of the fleet. Even if you want to access all logs, it's still better to leave them where they were produced and push a predicate out to every host; your log-producing fleet has far, far more compute resources than your poor little central database, no matter how big that DB is.
What a bunch of odd and arbitrary statements. Examples: I often use logs older than 7 days for troubleshooting. I rarely troubleshoot only using last minute or hour data. I need aggregation most of the time when troubleshooting. I also treat most runtime environments as cattle so relying on it to keep logs locally would be wrong.
All but trivially reproducible bug reports require, or benefit immensely from, logs about the transaction in question. The pipeline from support to product to engineering to an actual investigation is usually much more than 7 days.
May I ask what kind of production environments you have in mind? Are these large-scale FAANG-style deployments or something else?
Well, only to the extent that the management of small amounts of logs is not very interesting. There is not an ACM SIG for very small databases.

Anyway, GPDR requires you to have a purpose for any log that contains any IP address. Keeping logs for undefined purposes and unlimited time frames is not ok any more.

It could be an e-commerce site, which depending on the case can produce a shitload of logs. Imagine having a few hundred of thousands of users daily and you record every page they view, along with heatmaps, and whatnot. In most cases those logs should never be touched by a human in raw format. You feed them in your analytics engine, and start making decisions about your conversion. And then you delete them because the goalpost is constantly moving.
To be fair, a lot of problems came and come from 'let us store this stuff just in cases'. It also helps in some unforeseen cases, but in both scenarios you essentially end up in an unstructured unknown situation which is generally not what you want in IT, Business or as a person.