| One solution that Steve didn't discuss is JSON. To be fair, JSON wasn't that popular in 2005, but it's still a great solution. The way it works is that their are no mandatory newline characters in JSON. Whitespace between lexical elements is ignored, and any embedded newlines in strings can be escaped (i.e. as \n). So a log format that a few people are using today is like this: {'kind': 'foo', 'id': 1, 'msg': 'hi'}
{'kind': 'bar', 'id': 2, 'msg': 'there'} Each log message takes up a single line in the file. You can trivially deserialize any line to a real data structure in your language of choice. You can (mostly) grep the lines, and they're human readable. I do this at work, and frequently have scripts like this: scribereader foo | grep 'some expression' | python -c 'code here' In this case we're storing logs in the format described above (a single JSON message per line), and scribereader is something that groks how scribe stores log files and outputs to stdout. The grep expression doesn't really understand JSON, but it catches all of the lines that I actually want to examine, and the false positive rate is very low (<0.1% typically). The final part of the pipe is some more complex python expression that actually introspects the data it's getting to do more filtering. You can of course substitute ruby, perl, etc. in place of the python expression. I feel like this is a pretty good compromise between greppability, human readability, and the ability to programatically manipulate log data. |
http://simonwillison.net/2008/Jun/15/steveys/
That's the problem with discussing old articles. Information gets updated