Hacker News new | ask | show | jobs
by martin-adams 556 days ago
Can I confirm that the reason it's not preferred to have comments in data-formats is because it's to be machine read only and as such should be as efficient as possible and not contain information that wont be used?

Seeing as I can only see the use case as a file format to be read/written by humans in the loop, then maybe the conversation should be about compiling the file format to a data format for compatibility outside of the user tooling.

3 comments

The argument is that comments are often used as an escape hatch from specified formats to carry further instructions. So you got a properly specified format and then want to do vendor&extensions but not break other implementations ... just make your extensions a comment. Then other parsers ignore it and you can do your thing.

The idea is that this forces better formats.

How well this works? Well, then I got an "x-comment" property or non-standard comments. Nonetheless. If people see the need to hack some extension in, they'll find a way.

> is because it's to be machine read only

Why did they bother making it text-only ASCII then ?

JSON wins because it can be casually inspected by people testing bizarre theories. The importance of this is lost on people who don’t treat triage as a skill that can be honed.

I like to solve problems - or at least bringing them to me doesn’t result in a loss of status for either party. People notice this about me and bring me problems. Someone recently described to people what is essentially my process: the likelihood of the cause divided by the difficulty of verification. Partially sort and just start checking off assumptions.

A lot of cheap but low probability options get shuffled higher, and just sending the wrong data is a common enough problem, especially with caching. And if it’s nearly free to look at the payload, it’ll get checked. If it isn’t people will try everything else to avoid it.

> ASCII

JSON is notable for making UTF-8 encoding a hard requirement.

…which was pretty ballsy back in the mid-2000s. We were still fighting with Shift-JIS and Windows-1252. Excel didn’t add proper support for UTF-8 until depressingly recently.

Late 90’s I had to fix bugs in a shiftJIS implementation. And I couldn’t read a lick of Japanese. Still can’t.

I don’t remember when I started pushing for utf-8 everywhere but it was “early” by most people’s standards, so I know what you mean.

And one of the things that makes me dislike MySQL is that they have a field type called utf-8 that isn’t. And they didn’t fix it, they introduced a new type instead. So that footgun was still there for all to trigger. So mad.

Pretty sure they meant plaintext instead of ASCII.
JSON does not require UTF-8 encoding.
> JSON text exchanged between systems that are not part of a closed ecosystem MUST be encoded using UTF-8

https://datatracker.ietf.org/doc/html/rfc8259

Ah ok, fair enough. This is a more recent (2017) clarification of the standard which I hadn't seen. The original mid 2000s specification did not require UTF-8.

> Previous specifications of JSON have not required the use of UTF-8 when transmitting JSON text. However, the vast majority of JSON-based software implementations have chosen to use the UTF-8 encoding, to the extent that it is the only encoding that achieves interoperability.

The original spec did require that all JSON decoders support UTF-8, though.
I think in the JSON case its because you can't have true comments, any comments are intrinsically part of the data structure, and you invite problems by including irrelevant information