|
Oddly enough, even for large (>=1e5 physical machines) systems, grep works fine. Better yet, if the logs are important, you're shunting them off for some sort of longer-term storage for post-processing and indexing _anyway_, irrespective of the underlying disk format. Some folks continue to use plain text even then, just with some distributed systems magic wrapped around the traditional Unix tools. (If you're shunting _all_ of your log data off at that scale, you're crazy, and you'll melt your switches if you aren't careful.) The name of the game is to think of the problems that you're solving and how they relate to the business bottom line. No sooner, no later. Additionally, what's most troubling is that we've turned this exercise into an emotional one, not one with any sort of scientific-oriented perspective. I can personally say with conviction that I'd like to sit down and actually collect data on, e.g., how many instructions it takes to store logs to disk in plain text versus a binary format, how many it takes to retrieve logs from disk in both situations, and how much search latency I incur when trying to retrieve said logs from disk in the same. At scale, which is where most of my attention lies these days, that's the kind of thing that matters because those effects get amplified automatically—often to operators' and capacity planners' horrors—by the number of machines you have. If you're dealing with smaller systems, it won't matter as much, but at that point, you're probably dealing with the other side of this, which is having information on how many requests you get for historical log data and what sort of criteria were used in that search. If you're getting requests less frequently than, say, once per quarter, it likely wouldn't be worth your time to invest in what Mr. Nagy is evangelizing. tl;dr: Continue using your ad hoc grep-fu, but be mindful of how much time it takes you to get the data you're looking for. That alone will be your decision criterion for adopting something like this. |
But even still - I like to have the text files as journals of original entry - so I can occasionally do a tail -f incoming.log| egrep -i "somedevice".
And having the original files in text format is zero impediment to getting them into handy binary database form.