But they're not two different formats—they're two different jobs being done by the same format.
JSON as currently spec'd is honestly quite bad at both jobs, but the most rational defense of its use as a data format is that it's (mostly) human readable. Given that that's its main value proposition, what exactly is the reason for saying that JSON-as-data-format should not have comments? What do we lose if we allow them?
> Given that that's its main value proposition, what exactly is the reason for saying that JSON-as-data-format should not have comments? What do we lose if we allow them?
Because JSON originally did have comments, and people were putting pragmas into them, and so different parsers would act different depending on whether they understood them or not. Comments ended up being an anti-feature in JSON because people were abusing them.
Source:
> I removed comments from JSON because I saw people were using them to hold parsing directives, a practice which would have destroyed interoperability. I know that the lack of comments makes some people sad, but it shouldn't. […]
I don't buy it, what's stopping people from putting pragmas in key:value pairs? There's a chance of collision, but you're already deciding to sacrifice interoperability, so just accept that the myJson spec says '___declare___' is a reserved key.
If I parse json, I dont want to lose data. Having the parser read the comments (however they are, as long as they are in spec and therefore read by the parser) is a good thing. Having to parse the file again, with a fuzzy out-of-spec system (looking for comments) is clearly worse. The whole point of json is to serialize stuff, breaking that to insert non-machine readable comments makes the spec less reliable.
> Suppose you are using JSON to keep configuration files, which you would like to annotate. Go ahead and insert all the comments you like. Then pipe it through JSMin before handing it to your JSON parser.
But there are dangers there - look at how horribly comments get abused in code:
* doctests are nonsense, just write tests. (doctests like rusts that just validate example snippets are the closest thing to good I've seen so far, but still make me nervous).
* load bearing comments that code mangling/generation tools rely on (see a whole bunch of generated scripts in your linux systen - DO NOT EDIT BELOW THIS LINE)
* things like modelines in editors that affect how programs interact with the code
* things like html or xml comments that on parsing affect end user program logic.
Comments can be abused, and in something like JSON on the wire I can see systems which take additional info from the comments as part of the primary data input. Often a completely different format... and you end up with something like the front-matter on your markdown files as found in static site generators.
Point being, comments are not a purely benign addition.
> see a whole bunch of generated scripts in your linux systen - DO NOT EDIT BELOW THIS LINE
these are mostly a warning sign for humans, to be read as "if you need to modify the script below this line, a) you gotta be knowing what you're doing, we are not held liable for support if you change stuff around there b) please contact us to make sure we didn't miss a legitimate need or c) you're trying to do something in a bad way and there's better ways to do so".
I feel like most of your examples of "abuse" are just getting things done.
I don't see anything intrinsically wrong with doctests. I also can't see a better way to do "load bearing comments," and I'm not eager to go back to "Step 2: Edit your .bashrc to include foo."
at least a top-level metadata property can be explicitly defined in a .json.schema[0] and formalized, rather than being some kind of ad-hoc pre-processor step you have to evaluate before actually using the JSON data. I didn't even know about that approach before I read your comment but it instantly makes more sense to me in terms of maintainability and interoperability.
The problem is using JSON as a file format in the first place. It’s not designed for humans to edit. (Then again, it’s better than the Norway-sceptic YAML.)
I disagree. At least in an ought vs is sense: it's entirely the kind of format that I would create as an editable format. As witnessed by the fact that my workmates and I did create very nearly JSON previously as a file format in the 90s (but for C code programs)
TOML, extensions of json like json5 and hjson, a bunch of lesser known formats for nested structures like NestedText, UCL, kdl, Eno,sdlang, eldf, etc.
Also languages with some progrommatic capabilities like cue, dhall, jsonnet, nickel etc.
Non of them are perfect, and some are less suitable for certain use cases than others. But IMO pretty much all of them are better for human editing than json, and in many cases yaml.
JSON has a very minimal set of types and I regularly use all of them. I guess you could argue that integers and numbers could be combined, but I think that's it.
Can I confirm that the reason it's not preferred to have comments in data-formats is because it's to be machine read only and as such should be as efficient as possible and not contain information that wont be used?
Seeing as I can only see the use case as a file format to be read/written by humans in the loop, then maybe the conversation should be about compiling the file format to a data format for compatibility outside of the user tooling.
The argument is that comments are often used as an escape hatch from specified formats to carry further instructions. So you got a properly specified format and then want to do vendor&extensions but not break other implementations ... just make your extensions a comment. Then other parsers ignore it and you can do your thing.
The idea is that this forces better formats.
How well this works? Well, then I got an "x-comment" property or non-standard comments. Nonetheless. If people see the need to hack some extension in, they'll find a way.
JSON wins because it can be casually inspected by people testing bizarre theories. The importance of this is lost on people who don’t treat triage as a skill that can be honed.
I like to solve problems - or at least bringing them to me doesn’t result in a loss of status for either party. People notice this about me and bring me problems. Someone recently described to people what is essentially my process: the likelihood of the cause divided by the difficulty of verification. Partially sort and just start checking off assumptions.
A lot of cheap but low probability options get shuffled higher, and just sending the wrong data is a common enough problem, especially with caching. And if it’s nearly free to look at the payload, it’ll get checked. If it isn’t people will try everything else to avoid it.
JSON is notable for making UTF-8 encoding a hard requirement.
…which was pretty ballsy back in the mid-2000s. We were still fighting with Shift-JIS and Windows-1252. Excel didn’t add proper support for UTF-8 until depressingly recently.
Late 90’s I had to fix bugs in a shiftJIS implementation. And I couldn’t read a lick of Japanese. Still can’t.
I don’t remember when I started pushing for utf-8 everywhere but it was “early” by most people’s standards, so I know what you mean.
And one of the things that makes me dislike MySQL is that they have a field type called utf-8 that isn’t. And they didn’t fix it, they introduced a new type instead. So that footgun was still there for all to trigger. So mad.
Ah ok, fair enough. This is a more recent (2017) clarification of the standard which I hadn't seen. The original mid 2000s specification did not require UTF-8.
> Previous specifications of JSON have not required the use of UTF-8 when transmitting JSON text. However, the vast majority of JSON-based software implementations have chosen to use the UTF-8 encoding, to the extent that it is the only encoding that achieves interoperability.
I think in the JSON case its because you can't have true comments, any comments are intrinsically part of the data structure, and you invite problems by including irrelevant information
And who knows what deeper layers of hell we avoided.
Frankly, VSCode shows that all this time people were complaining about no comments in JSON config and how hard it was to write config in JSON, they could have just written their apps to strip comments at read time.
JSON is awful for writing manually because it requires typing too many quotes, commas etc. I think JSON is meant to be machine-generated and machine-read and therefore doesn't need any comments.
JSON as currently spec'd is honestly quite bad at both jobs, but the most rational defense of its use as a data format is that it's (mostly) human readable. Given that that's its main value proposition, what exactly is the reason for saying that JSON-as-data-format should not have comments? What do we lose if we allow them?