Hacker News new | ask | show | jobs
by taeric 1169 days ago
If we are complaining about the closing tags, might as well add that embedding newlines or quotes into JSON is less than pleasant.

Which is to say, this feels a touch of a non-issue. Yes, writing it by hand can get tedious, but that is true of any and every format. Is why you will almost certainly reach for other formats if doing a long list of data. And each and every one of them will fail for some form of input in ways that is frustrating.

2 comments

Writing that JSON example by hand wasn't tedious. The XML example was, and the result is unreadable. It's important to be able to debug things easily. I'm going to manually type JSON when I'm testing an API, and I'm going to read the response.

If you absolutely don't care about human interface, no reason to use XML either. It's meant to be more verbose. The XML tags will often dominate the size of the payload with things like `<question>Who</question>`, so you have to start thinking about shorter names. Yes JSON has a similar problem, but at least it's halved and you don't have to instruct everyone to call each list element "e". If you super care about size, you'll use protobufs or something.

<question>Who</question> "question":"Who",

To me, this does not seem like a win that's worth much, especially since it's likely to shrink considerably even with naive fast compression.

Furthermore, as messages grow in size, the explicitly named closing tag actually kind of starts helping.

Both of these syntaxes have their annoying quirks, for sure, and I understand you really dislike the closing tag; that clearly doesn't bother many people.

But regardless of personal preference, I'm really skeptical any of this really explains json's relentless path to replace (most) xml. Other reasons, such as the extreme wordiness some xml apis chose, the poor implementation of namespaces, the problems with embedding arbitrary data (in particularly control characters), the inconsistency between attributes and elements, the lack of support for numbers, the lack of (conventional) support for key-value pairs - all of these surely played a much greater role than a fairly limited syntax issue.

And it's not even like json is without impractical quirks; lack of comments, the ban on trailing commas, and the need for quotes in object-keys spring to mind. Yet those don't mean json is likely to die out soon - even though even javascript itself from which it is derived doesn't suffer from those (anymore)!

Not wrong, but also probably not really indicative of problems or actual use. And while I will be manually typing some data to go into an api for testing, I'm far more likely to by typing it in a thing that is was looser in what it accepts than a json document. Literally today just using dicts in python. And even then, my debugging is dominated by mistakes in data entry there.

Also, I see you took it to be a full on defense of XML. I did not really intend it that way. I think both can be fine. And insisting on either is likely a mistake.

I do find your nitpicks here amusing, still. Size of tag is just as obnoxious as size of key. And, though it can dominate the textual representation, there are clear ways to reduce that. Even knowing that BSON and Binary XML exist, though, I'd be hard pressed to say any project that failed because they weren't using them.

JSON vs XML isn't going to make or break your project. But why would you use XML for data interchange. It makes sense for things like HTML where you're writing a document, but otherwise, it's usually just a needless burden.

Like, if I were there when XMPP was created, yes I would have insisted on JSON. XML was a plainly bad choice. Edit: Oh, JSON didn't exist until a little later. Maybe something similar did.

I mostly agree. I do think Jupyter choose wrong by picking JSON for their documents. They are literally marked up source documents.

XML does have the "benefit" of being a bit more extensible than JSON. Specifically, being able to have namespaced elements in there does make some sense on paper. For example, you could have two extensions both add in data using the same keys, but different namespace. Can't really do that with JSON.

In practice, I think it just fell flat due to way too much "forethought" in things they anticipated people wanting.

Yes, XML is probably a good fit for something like Jupyter. Basically if you want to reuse a lot of "objects" throughout a structure and have the mean the same thing in different nested parts of it. Like how <a> in HTML means a hyperlink whether it's under <body> or some nested <div>.
I'd phrase it more that there is a document with mixed use items marked up throughout it. Some items in the document are code, in which case you probably want to fence the code with a marker on what language is used. Other items are just prose, in which case you'd like to just write the prose as much as you can.

Some items can even be other forms of xml that have their own schemas dictating what is valid. (Thinking SVG here.)

I'll also note that even there, I can see why HTML went with the odd parsing they do. XHMTL tried going with "well formed" documents, but that falls flat for the authors. Is why "sections" of a document are essentially just collecting all of the "h" tags and making an implied tree out of that. As opposed to making the tree directly. To that end, my markup language of choice for Jupyter style things is org-mode in emacs. Yes, it has some warts; but again, all formats that I have ever seen have warts.

Edit: I want to add that I don't intend this as a "correction." I should say that I agree with your post. Complicated field where I doubt I'd have done better than most others. :)

Yeah, now express this in JSON:

   <div>
     <p>JSON example:</p>
     <pre>
      [
        {"title": "Led Zeppelin II", "artist": "Led Zeppelin", "price": 999},
        {"title": "La Brise", "artist": "Arax", "price": 999},
      ]
     </pre>
     <p>Source: <a href="https://news.ycombinator.com/item?id=35472014">click here</a>!</p>
   </div>
JSON is great for a certain domains, but there are other domains where it is a nightmare and XML shines.

Use the right tool for the job.

you can't ignore ux stuff like this in a protocol that's meant for general use

something like duplicating info in closing tags in XML (which applies to every element) isn't really comparable to stuff like having to escape certain characters in JSON strings (which applies only to the values use those things)

perfect is the enemy of the good, and the good is the metric

Don't you also have to escape stuff in XML? Like &gt, which is even worse.
Yes, though many languages have lenient parsers. Most browser parsers, for example, will probably only be lenient if parsing "HTML."

    new XMLSerializer().serializeToString(new DOMParser().parseFromString("<a>hello < </a>", "text/html")) 
The above in my console does as expected there. And again, entities are a very dangerous part of XML and friends.

You are correct that if you tell it that that is xml, the browser will throw it back at you. Just as the JSON parser will barf on JSON.parse("{'test':'value'}").

per specifications, json parsing is not lenient, html parsing is lenient
Right, and amusingly, more than a few json parsers are very lenient in this. That or folks abandon ship fairly quickly and go for another spec that is far more friendly.
well json definitely does not accept `{'test':'value'}` as valid input

any parser that behaves otherwise is pretty clearly buggy

json has many problems but parsing ambiguity is not really one of them

To be pedantic, html parsing is not lenient, it is unambiguously specified.
if that were true then browsers would refuse to render text/html responses that didn't include a closing </html> tag, i guess