|
|
|
|
|
by sriram_malhar
559 days ago
|
|
I once created a library (now bit rotted) that did all the things you suggested plus some: schema, binary representation, changing date times to offsets from the first record's date time, abbreviating common strings like hostnames etc. There were a bunch of problems/irritants mostly stemming from the fact that the format become stateful. Every log needed to have a schema (or repository) available. Abbreviations and date offsets meant that the log contained meta information ... for example, assignment of a compact abbreviation to a string in anticipation of using that abbreviation from that point on. This meant that the log could not be arbitrarily lopped off. And to my chagrin, I found that simply gzipping a json stream made it almost as compact! That's when I figured it wasn't worth it. I'd probably have investigated more if there was CPU or memory bandwidth pressure in that situation (due to creating more data just to compress it). |
|