|
Json is a good format to represent results of aggregation queries (group by in sql) using nesting and storing data in a single file. Without that you would need to either 1. store multiple not-nested (tabular, eg. csv) files and join them at the time of use.
2. denormalize all these csvs into a single big csv duplicating the same values over and over. Compression should handle this at storage time, bht you still pay the cost when reading.
3. store values by columns, not by rows, adding various RLE and dict encodings to compress repeated values in columns, making the files not human friendly
4. once you store it in columns and make it unreadable, just store it as binary instead of text. You get parquet
Json and csb are simple and for that reason they won and will stay with us no matter how hard you try to add features to it.That said I think adding a trailing comma and comments to json wouldn't be a big stretch. The battle will be for the best columnar binary format. Parquet is the closest to a standard, but it seems to be used only as a standard for a storage. Big data systems still uncompress it and work with their own representation. The holy grail is when you get a columnar format which is good enough that big data systems use it as their underlying data representation instead of coming up with their own. I suspect such format will come from something like open sourced Snowflake, Clickhouse, Chaossearch or something like that, which has battle tested performant algorithms on them, instead of designed by committee, such as parquet. |
Sadly, json's designers suffered from the same hubris as the designers of markdown and gemini, when they decided to not include a version number in the file format. So you are kind of hosed if you want to make a change like that.
Before json there was xml (ugh), but before xml there were Lisp S-expressions, which seem to have handled all these issues perfectly well 50 years ago. Yet we keep re-inventing them. Greenspun's tenth law is still with us.