Hacker News new | ask | show | jobs
by LinaLauneBaer 4698 days ago
There is a interview with the inventor of JSON somewhere. In that interview he explained why he did not allow comments in JSON like in XML. He said - if I remember correctly - that it was intentional to not have comments in JSON. The reason way that comments could be misused to add additional information for a parser. For example in XML you could use comments and a special parser could use these comments to create code while parsing. He did not want that. He wanted every JSON parser to be a JSON parser and nothing more. If you wanted to have comments in JSON he said that you could simply make the comments inline and have a convention for the keys which are comments for example every key ending with _comment could have a value which is then seen as a comment by the application but not by the parser.
3 comments

Yes the JSON spec was designed with interoperability in mind, I don't believe Crockford claims to have invented JSON, merely discovered it.

That said if you want your Static JSON objects to have comments, just pipe the JSON object through a minifier to strip comments before parsing.

You are correct - confirmed in this video: Lessons of JSON

'A recent (and short) IEEE Computing Conversations interview with Douglas Crockford about the development of JavaScript Object Notation (JSON) offers some profound, and sometimes counter-intuitive, insights into standards development on the Web.'

http://inkdroid.org/journal/2012/04/30/lessons-of-json/

{ Thank you Douglas for your vision :) }

He both invented and discovered it. Yes, the object literal syntax existed, but he also carefully (and IMHO correctly) specified a strict subset as well, for these interoperability reasons. For instance, Javascript is happy with {a: 1}, but that is not legal JSON. It's a very well done standard.
JSON is not actually a strict subset. Certain characters when left unescaped in a JSON string make for invalid JavaScript: http://timelessrepo.com/json-isnt-a-javascript-subset
Indeed, and I apologize for my ambiguity, as you are correct. By "strict subset" what I meant was a subset that attempts to reduce options, so that legality and illegality is easier to discern. That is, where Javascript accepts apostrophe and double-quote to delimit strings, JSON only accepts double-quotes, thus, "stricter" than real Javascript.

You are of course correct that JSON turns out not to quite be a strict subset in the set theory sense of "strict subset", though obviously that's a bug in the spec rather than a deliberate design decision.

Douglas Crockford has also posted his explanation on Google+:

https://plus.google.com/118095276221607585885/posts/RK8qyGVa...

"I removed comments from JSON because I saw people were using them to hold parsing directives, a practice which would have destroyed interoperability." -- Crockford

This is horrific design reasoning. It's an authoritarian, presumptuous, "punish everyone in the classroom because one child misbehaves" mentality.

Comments would be useful in JSON because comments are useful in code, and JSON is code. For example, I might have a config file that I'm typing in that I want to leave a documentation trail for.

Don't tell me I can do a silly thing like redefine a field, as if it's "neat". It's an abomination that I have to resort to such things. And guess what: by resorting to such things I can still do precisely what Crockford claims he was trying to prevent. So his rationale is not only insulting to one's intelligence, it's sheer stupidity.

> It's an authoritarian ...

Which is pretty much what a specification is.

It's one or more people saying "This is how things are if you call them X".

> presumptuous

Presumptuous? It was in response to the feature being abused!

> "punish everyone in the classroom because one child misbehaves" mentality

No more than creating laws is. A significant subset of the population are misusing it in such a way as could cause widespread damage. It is a minor inconvenience to the 'law abiding people' (particularly given than any comments would be removed if read in and spat out by any program). There are workarounds ("field_comment":"some comment") or if that's not enough, use another format. Use one that allows comments, there are many.

> Don't tell me I can do a silly thing like redefine a field, as if it's "neat". It's an abomination that I have to resort to such things

It's also completely unreliable, it's a terrible solution and nobody should use it. I think we're fully in agreement here.

> And guess what: by resorting to such things I can still do precisely what Crockford claims he was trying to prevent. So his rationale is not only insulting to one's intelligence, it's sheer stupidity.

No you can't. The point was to stop people adding pre-processing commands or other such things to json, which would be in random formats and invisible to some parsers (as comments should be), visible and important to others. You don't want to pass a valid piece of JSON through a parser and end up with two different outcomes dependent on something in a comment, do you? Or have to use parser X or Z because Y doesn't understand directive A, but it does understand directive B and C, and while Z understands C, and X knows B, Z doesn't, so I have to use the version from a pull request from DrPotato which I think supports...

What I'm saying is that there is a benefit in simple standards.

I'm curious how the notion of XML processing instructions informs your opinion. In general I think having a standard is somewhat more important than the precise details in the standard, but XML PIs enable precisely the kind of thing Crockford feared, yet it doesn't seem to have materialized. Is this because processing instructions are not inherently harmful or because segregating them from comments disarms them?
XML PIs have a spec, don't they? (actual question) From some googling the W3C site has this :

> PIs are not part of the document's character data, but must be passed through to the application

If they're being passed through and not being used by the parser, it's no different really than a

    "directive" : "blah"
in JSON, which is fine. The application at the end needs to deal with it, but the parser doesn't, and that's really important. If it's just a comment, passing the file into and out of a program could remove the comment.

    something.json | python -mjson.tool | myjsonprocessingapp
Should be the same as

    something.json | myjsonprocessingapp
If the parser does need to understand the directive, at least there's a difference between an error of "I don't understand directive X" and no error at all because your parser ignored the comments.
> and JSON is code

JSON is data. It appears to be JS code, but JSON is data. Data is not code ( http://www.c2.com/cgi-bin/wiki?DataAndCodeAreNotTheSameThing ). That's why the idea of data holding parsing directives is silly. If you want to do that, then embed that in the data (hold a MsgType key in the data records). There's no need for comments unless you are trying to use it for something other than raw data.

> There's no need for comments unless you are trying to use it for something other than raw data.

Is this a true statement? Even books have margins, and word docs comments. I think it’s not infrequent that pure data calls for metadata to put it into context for future users of that data.

And in computing most "pure data" formats have had either comments - or schemas and specifications which outline which the contents. The later sure look like comments stored externally to the documents, from my perspective.

In general I do not think data is self describing, and thus must be commented on in some form to describe it.

You can represent annotations (which describe most of your examples) by adding keys:

    {
        "data": "some data",
        "data_comments": "here are my comments"
    }
Not transparent to actual clients of the data.

edit for clarity: You're assuming that the application code isn't doing something with each key that it reflectively sees in the object, e.g. creating database fields to match them, or launching missiles towards those destinations, etc.. If you wouldn't automatically add dummy elements to a hashmap or dictionary in Java or Python, then you shouldn't add keys in a javascript object, unless you control the source to the program that will processing the data. Even then you shouldn't, because it will become a habit to add comments this way, and that will bite you when an extra key does matter.

or just use the key "comment" more than once, which is sort of a hybrid of the ideas.
Parsers might throw an error on duplicate keys, or launch emacs solving the towers of hanoi.
"Data is not code"

Lisp programmers disagree.

Lisp programmers think "Code is data", not "Data is code"
Lisp programmers write code so that data is code.
Lisp: "All code is data"

That's not the question. "All data is code" is not the same statement.

In a different context: "All apples are fruit" may be true but that doesn't imply "all fruit are apples"

Don't Lisp much do you?

Code is data and vice versa. Look up what the acronym JSON means sometime.

Code is data but data isn't necessarily code. Even in Lisp.
The difference is one of interpretation, not of representation; i.e. it's determined by an application, above parser level. When looking just at the written down form, data and code are the same thing.

Code more Lisp and read more Hofstadter ;).

> Data is not code

All code is data, but not all data is code.

Nonsense. This is just more arrogance.

JSON is code because I use it as code. It's not your business to tell me it's not code -- you haven't seen how I'm using it. And don't go chirping that I should only do things your way, it's none of your god damned business what I'm using it for.

Further, if JSON was really only data, then it's an incredibly stupid way to store data, given that it has a human-readable syntax that the computer can only deal with after it's been parsed. As data, it's bloated and inefficient. To the extent that JSON is a good format, it's code. To the extent that it's data, it's not a good format.

A fork can be a spoon for you, if you choose to use it that way. Nobody is telling you what you are supposed to use it for, but still JSON was designed as data format.

If you don't like the format or feel that JSON is too restrictive/bad feel free to extend it or create your own format from scratch.

> still JSON was designed as data format

While I don't think that comments belong in JSON, I don't agree JSON is designed as "data and not code" format. Trees of tokens are actually the natural format for writing code (also known as Abstract Syntax Trees, AST) and the data/code distinction is really, really blury when those two meet together, so it's only to be expected that people will end up coding in JSON (what are the 'build definition' files for various build tools / package managers, if not very simple programs)?

You can use a screwdriver as a hammer all you want, it's not going to make it a good idea. This isn't a free speech issue.

> Further, if JSON was really only data, then it's an incredibly stupid way to store data, given that it has a human-readable syntax that the computer can only deal with after it's been parsed. As data, it's bloated and inefficient.

So use something else. Also, a computer can only read any file after it's been parsed in some way. I'm not really sure what you're suggesting as an alternative.

> To the extent that JSON is a good format, it's code

Is it executable? Is it turing complete?

> Is it executable? Is it turing complete?

It represents groups of more-less arbitrary tokens as trees, therefore it's a natural format for code representation as it's equivalent to an AST, therefore it's trivial to attach a basic execution context with if and lambda defined, and now it's executable and turing-complete.

So any indented text file would be considered code?
> JSON is code because I use it as code

You could use JSON as code, but that's somewhat silly, because there's already a superset of JSON designed for that use.

Technically not true: http://timelessrepo.com/json-isnt-a-javascript-subset

        {"JSON":"ro
cks!"}
(there's a unicode line separator -- 2028)
> JSON is code because I use it as code

You can't use JSON to compute things, therefore it is not code (unless you are willing to concede that any document format is code).

Maybe a more useful resolution to this would be to state that while all code is data, no data should be code?

You could, if you were crazy enough, write perfectly valid JSON that passed the values to eval() or a parser or what have you. And while there are encodings in JSON that don't work in javascript (i've broken JS innumerable times trying to get that to work) JS does of course allow you to add closures as an object, or an array, whatever you like, and some forms of valid JSON (if not all) are also valid javascript. So you could indeed use JSON to compute things if you wanted to.

> unless you are willing to concede that any document format is code

Because it is. Data vs. code distinction is arbitrary. The following sequence of characters:

"echo 'foobar';"

can be interpreted as describing a string, a series of tokens, a piece of code, a piece of music or a small icon, whatever interpretation you choose.

Yes, I understand that "code is data". This does not mean that data, in general, is code; unless you are willing to make the words completely meaningless. "Code" requires some notion of an execution platform/environment, which does not exist for arbitrary data. Here is a string: "the quick brown fox jumps over the lazy dog". Or how about "\u0000\u0000". That is not code, as generally understood.
So is all opinionated design "stupid"?

I do not presume to know who you are, or what you have accomplished, but there are few people with the professional and academic background that qualify to be able to call Douglas Crockford "stupid".

>So is all opinionated design "stupid"?

He never said that.

>I do not presume to know who you are, or what you have accomplished, but there are few people with the professional and academic background that qualify to be able to call Douglas Crockford "stupid".

Why, who do you think Douglas Crockford is and what is his "academic background"? He doesn't even have a related degree. Most of his JS fame he ows to his book.

Since I too lack the lofty requisite background for it, I'll just let Mr. Crockford do the job the for me:

> The reason to use semicolons is because coding rigor tends to produce significantly better software.

Don't put words in my mouth, I didn't claim he was "stupid." To say that one thing he said somewhere is stupidity is a far cry from claiming he is stupid.

I also never said that "opinionated design is stupid".

Perhaps you could rephrase your question in such a way that you aren't presuming to speak for me.

JSON isn't a configuration language, it's just another data encoding format with the added benefit of being readable by humans. That and its ubiquity make it an appealing choice for stuff like ad-hoc configuration at first glance, but it's not the best choice. If you want a config language for shared human and machine consumption, use one designed for that purpose. JSON is pretty much just an encoding that is easy for humans to inspect and debug.
This. I've worked with a number of systems that "use json as the configuration language"; and in every case it's led to issues.

Given a choice it's better to have a .ini style format like the one that pythons ConfigParser will digest. That way you can have sections, comments and you won't be tempted to have the application write things into the configuration on it's own...