Hacker News new | ask | show | jobs
by robinhouston 4055 days ago
> For example: find all logs between 2013-12-24 and 2015-04-11, valid dates only.

That’s a straw man. If you’re grepping logs, you don’t need a regular expression that matches only valid dates because you can assume that the timestamps on the log records are valid dates. But I suppose

    2013-12-(2[4-9]|3.)|2014-..-..|2015-0([123]-..|4-(0.|1[01]))
doesn’t look so bad.

The whole thing is similarly exaggerated.

6 comments

Not to mention 99.9% of the searches one does of a log file isn't really that complex. Heck, I'm willing to wager that 90% + of my searches over the last 20 years have been in log files from a particular day.

That's the thing about having simple text log files - the cognitive load required to pull data out of them, often into a format that can then be manipulated by another tool (awk, being one of the more well known), is so low that you can perform them without a context switch.

If you have a problem, you can reach into the log files, pull out the data you need, possibly massage/sum/count particular records with awk, all without missing a beat.

This is particularly important for sysadmins who may be managing dozens of different applications and subsystems. Text files pull all of them together.

But, and here is the most important thing that people need to realize - for scenarios in which complex searching is required, by all means move it into a binary format - that just makes sense if you really need to do so.

The argument isn't all text instead of binary, it is at least text and then use binary where it makes sense.

More to the point: Text logs are just as structured as binary logs, but they have the additional property of not being as opaque and, therefore, being immediately usable with more preexisting, well-tested, well-known tooling.
> If you’re grepping logs, you don’t need a regular expression that matches only valid dates because you can assume that the timestamps on the log records are valid dates.

Even _if_ I agreed with your assumption[1], are you actually suggesting that

    2013-12-(2[4-9]|3.)|2014-..-..|2015-0([123]-..|4-(0.|1[01]))
is a serious solution? I admit that it is shorter than the author's solution, _but it still proves his point_.

And then what about multi-line log lines? `grep` can't tell where the next line is; sure, I can -A, but there's no number I can plug in that's going to just work: I need to guess, and if I get a truncated result or too much output, adjust. Worse, I get too much output _and_ a truncated record where I need it…

    log-cat --from 2013-12-24 --to 2015-04-11 | grep <further processing>

[1] most log file formats I've run across do not guarantee the date to appear in a given location.
Using regexs for time is like using regexs for HTML: it's possible-ish, but most people are probably doing it wrong and storing things using their correct data structures is a much simpler solution.
Or you could just write about five lines of Python that splits up each line and uses datetime for comparisons.
Agreed. It makes one wonder just how much administration this person has actually done in their life.