| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by martin-adams 556 days ago
	Can I confirm that the reason it's not preferred to have comments in data-formats is because it's to be machine read only and as such should be as efficient as possible and not contain information that wont be used? Seeing as I can only see the use case as a file format to be read/written by humans in the loop, then maybe the conversation should be about compiling the file format to a data format for compatibility outside of the user tooling.

3 comments

johannes1234321 556 days ago

The argument is that comments are often used as an escape hatch from specified formats to carry further instructions. So you got a properly specified format and then want to do vendor&extensions but not break other implementations ... just make your extensions a comment. Then other parsers ignore it and you can do your thing.

The idea is that this forces better formats.

How well this works? Well, then I got an "x-comment" property or non-standard comments. Nonetheless. If people see the need to hack some extension in, they'll find a way.

link

ur-whale 556 days ago

> is because it's to be machine read only

Why did they bother making it text-only ASCII then ?

link

hinkley 556 days ago

JSON wins because it can be casually inspected by people testing bizarre theories. The importance of this is lost on people who don’t treat triage as a skill that can be honed.

I like to solve problems - or at least bringing them to me doesn’t result in a loss of status for either party. People notice this about me and bring me problems. Someone recently described to people what is essentially my process: the likelihood of the cause divided by the difficulty of verification. Partially sort and just start checking off assumptions.

A lot of cheap but low probability options get shuffled higher, and just sending the wrong data is a common enough problem, especially with caching. And if it’s nearly free to look at the payload, it’ll get checked. If it isn’t people will try everything else to avoid it.

link

DaiPlusPlus 556 days ago

> ASCII

JSON is notable for making UTF-8 encoding a hard requirement.

…which was pretty ballsy back in the mid-2000s. We were still fighting with Shift-JIS and Windows-1252. Excel didn’t add proper support for UTF-8 until depressingly recently.

link

hinkley 556 days ago

Late 90’s I had to fix bugs in a shiftJIS implementation. And I couldn’t read a lick of Japanese. Still can’t.

I don’t remember when I started pushing for utf-8 everywhere but it was “early” by most people’s standards, so I know what you mean.

And one of the things that makes me dislike MySQL is that they have a field type called utf-8 that isn’t. And they didn’t fix it, they introduced a new type instead. So that footgun was still there for all to trigger. So mad.

link

joemi 556 days ago

Pretty sure they meant plaintext instead of ASCII.

link

foldr 556 days ago

JSON does not require UTF-8 encoding.

link

DaiPlusPlus 556 days ago

> JSON text exchanged between systems that are not part of a closed ecosystem MUST be encoded using UTF-8

https://datatracker.ietf.org/doc/html/rfc8259

link

foldr 556 days ago

Ah ok, fair enough. This is a more recent (2017) clarification of the standard which I hadn't seen. The original mid 2000s specification did not require UTF-8.

> Previous specifications of JSON have not required the use of UTF-8 when transmitting JSON text. However, the vast majority of JSON-based software implementations have chosen to use the UTF-8 encoding, to the extent that it is the only encoding that achieves interoperability.

link

DaiPlusPlus 556 days ago

The original spec did require that all JSON decoders support UTF-8, though.

link

burnished 556 days ago

I think in the JSON case its because you can't have true comments, any comments are intrinsically part of the data structure, and you invite problems by including irrelevant information

link