| > why don't we just use base64-encoded JSON or something? Base64 would be counterproductive, increasing both space and parsing time... presumably out of fear that JSON would contain a naughty byte for a greenfield file format. It would be much better to just design tho format to not have any naughty bytes. Parsing time for exacutables and libraries is definitely on the critical startup path. You really want a length-delimited format, or better yet, one where offsets to various structures are stored at fixed offsets so you can find everything in O(1) time with a tiny constant factor. Compiler writers and tool authors are perfectly comfortable working with binary file formats. There's nothing more inherently future-compatible about JSON than a forward-compatible binary format like flatbuffers. Having to escape and then unescape naughty bytes is a huge downside for text-based formats that are hardly ever read by humans. On a side note, Zlib DEFLATE / gzip / LZMA etc. aren't magic for getting rid of space overheads. Try gzip -9'ing your system's wordlist, now convert it to UTF-16 and gzip -9'ing it. You'll see a several percentage increase in size, despite an entropy change of at most a constant and small number of bits (-log2(P(UTF-16)/P(UTF-8)). I've frequently seen huge JSON proponents use hand-wavy arguments that gzip will reduce any size differences to zero. It's also nice if the file format is very close to being able to just be mmap()ed into the process's address space and only require minimal patching to a minimum number of pages in order to be an optimized in-memory representation. Also, there's a huge amount of momentum behind executable formats. Incremental improvements by adding new features in new segment types or appending new fields to old data structures (where there's no ambiguity) is much preferred to wholly new formats. So, creating a new debugging symbol section that's just flatbuffers is workable. Replacing the whole ecosystem with base64-'d JSON would have way more downsides than upsides. On another side note, you need to be very careful with JSONifying floating point values. Many libraries don't give you bit-perfect round-tripping of IEEE-754 double precision values. |