Hacker News new | ask | show | jobs
by eklitzke 5862 days ago
One solution that Steve didn't discuss is JSON. To be fair, JSON wasn't that popular in 2005, but it's still a great solution.

The way it works is that their are no mandatory newline characters in JSON. Whitespace between lexical elements is ignored, and any embedded newlines in strings can be escaped (i.e. as \n). So a log format that a few people are using today is like this:

{'kind': 'foo', 'id': 1, 'msg': 'hi'} {'kind': 'bar', 'id': 2, 'msg': 'there'}

Each log message takes up a single line in the file. You can trivially deserialize any line to a real data structure in your language of choice. You can (mostly) grep the lines, and they're human readable. I do this at work, and frequently have scripts like this:

scribereader foo | grep 'some expression' | python -c 'code here'

In this case we're storing logs in the format described above (a single JSON message per line), and scribereader is something that groks how scribe stores log files and outputs to stdout. The grep expression doesn't really understand JSON, but it catches all of the lines that I actually want to examine, and the false positive rate is very low (<0.1% typically). The final part of the pipe is some more complex python expression that actually introspects the data it's getting to do more filtering. You can of course substitute ruby, perl, etc. in place of the python expression.

I feel like this is a pretty good compromise between greppability, human readability, and the ability to programatically manipulate log data.

4 comments

"XML is better if you have more text and fewer tags. And JSON is better if you have more tags and less text. Argh! I mean, come on, it’s that easy. But you know, there’s a big debate about it." — Steve Yegge

http://simonwillison.net/2008/Jun/15/steveys/

That's the problem with discussing old articles. Information gets updated

Not really. The arguments still hold.
JSON is great, but the thing that bugs me about this usage is that it is essentially a bloated version of a "normal log". You don't need the field names, braces, :, quotes or in-fact most of the characters there, just single character delimited columns (traditionally comma, space or tab) with rows delimted by some other character (traditionally newline), some rules for escaping (or not) and the first row as the field names (if they are not obvious). Its more human readable, more machine readable and shorter than JSON - and actually, its already the unofficial standard so you don't need to convince anyone of anything.
Sure, if my data is tabular I always end up with a CSV-like arrangement, usually space-separated. The original article however talks about how data always ends up hierarchical and tree-like. JSON represents tree-like data very succinctly and very readably.

Still, the real XML-killer for me is YAML. It's even more readable than JSON, and allows many documents in a single file. This makes it excellent for logs, or for any application where your files get big and you want to stream records off them without having to parse the whole file into memory at once. Sure, you can do this with XML and parser hooks, but it's so much more of a pain than just iterating over top-level YAML documents.

Another killer feature is that it's simple enough that I've been able to ask clients to provide me with information in YAML format just by giving them an example record to follow. They're non-technical, but they can read it as easily as me. That's a pretty big win.

S-Expressions

    (:kind foo :id 1 :msg "hi") (:kind bar :id 2 :msg "there")
Err, my formatting got messed up. Pretend like there's a newline between the two log entries I described.
Indent the code two spaces. http://news.ycombinator.com/formatdoc