| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by PurpleRamen 1177 days ago
	With XML, the complexity is the baseline, and it only goes up from there. With JSON, the complexity is just an option, the baseline is pretty simple. Also, good XML-tools are rare or expensive.

6 comments

andyjohnson0 1177 days ago

Baseline for XML would be a document that doesn't use schemas, namespaces, attributes, or any of the SGML legacy stuff like DTDs and PCDATA.

Such a document is essentially as simple as the equivalent JSON.

link

hot_gril 1177 days ago

Even that is more complicated than JSON.

link

andyjohnson0 1177 days ago

Care to elaborate?

link

hot_gril 1177 days ago

Every key is written twice, for opening and closing. Keys can be duplicated, and in fact that's what you have to do if you want a simple list. There aren't numeric types, so you have to parse strings. It also looks horrible.

  <cds>
    <cd><title>Led Zeppelin II</title><artist>Led Zeppelin</artist><price>999</price></cd>
    <cd><title>La Brise<title><artist>Arax</artist><price>999</price></cd>
  </cds>

  <cds>
    <cd>
      <title>Led Zeppelin II</title>
      <artist>Led Zeppelin</artist>
      <price>999</price>
    </cd>
    <cd>
      <title>La Brise<title>
      <artist>Arax</artist>
      <price>999</price>
    </cd>
  </cds>

vs something like

  [
    {"title": "Led Zeppelin II", "artist": "Led Zeppelin", "price": 999},
    {"title": "La Brise", "artist": "Arax", "price": 999},
  ]

You can probably do better using XML attributes. But then you're using more features.

link

taeric 1177 days ago

If we are complaining about the closing tags, might as well add that embedding newlines or quotes into JSON is less than pleasant.

Which is to say, this feels a touch of a non-issue. Yes, writing it by hand can get tedious, but that is true of any and every format. Is why you will almost certainly reach for other formats if doing a long list of data. And each and every one of them will fail for some form of input in ways that is frustrating.

link

hot_gril 1177 days ago

Writing that JSON example by hand wasn't tedious. The XML example was, and the result is unreadable. It's important to be able to debug things easily. I'm going to manually type JSON when I'm testing an API, and I'm going to read the response.

If you absolutely don't care about human interface, no reason to use XML either. It's meant to be more verbose. The XML tags will often dominate the size of the payload with things like `<question>Who</question>`, so you have to start thinking about shorter names. Yes JSON has a similar problem, but at least it's halved and you don't have to instruct everyone to call each list element "e". If you super care about size, you'll use protobufs or something.

link

preseinger 1177 days ago

you can't ignore ux stuff like this in a protocol that's meant for general use

something like duplicating info in closing tags in XML (which applies to every element) isn't really comparable to stuff like having to escape certain characters in JSON strings (which applies only to the values use those things)

perfect is the enemy of the good, and the good is the metric

link

andyjohnson0 1177 days ago

Thanks. I get your point about the close element including the tag name - but that's the kind of detail I leave to the serialisation library, in the same way that the close scope token in json is different to the start scope token.

As for "looks horrible"... well yeah, I always feel that xml looks "spikey" somehow. But I've been programming in curly-brace languages for 30+ years and I still find json harder to read than xml: I think my brain tries to interpret it as code, not data. I find xml easier to read (even when its unformatted) precisely because the close-tokens kind of document what element they're closing.

Each to their own I guess. At least we're not stuck using ASN1.

link

hot_gril 1177 days ago

> At least we're not stuck using ASN1.

Prepare for trouble, and make it double: http://xml.coverpages.org/dstc-xer2.html

link

datavirtue 1177 days ago

And if someone is nice enough to stuff a NUL in the document, it all shatters.

link

kitsunesoba 1177 days ago

Also this may just be the time in which I got into programming showing, but it seems like JSON encoding/decoding has been built into more languages than support for XML ever was. That's one less required dependency and thing to have to think about in many cases, like in Swift projects all I have to do is make sure my model structs/classes conform to Codable and I'm ready to hit endpoints.

link

hajile 1177 days ago

That’s because writing a JSON parser is pretty straight forward with just a couple edge cases.

Writing a conformant XML parser is a HUGE undertaking comparison.

I could get most places to give me the time to write a JSON parser in whatever language of it didn’t have one. I couldn’t do that with XML.

Because of this, every common language (and most uncommon ones) has a JSON parser while XML parsers are less common (and fully conformant ones are even more rare).

link

adolph 1177 days ago

Here to say this too. Compositional complexity is an advantage.

As a human in a repl, I appreciate the balance of readibility between XML which uses a larger set of syntactical characters, and YAML which uses fewer.

I also appreciate JSON's ontological simplicity over XML. This primarily boils down to the lack of attribute nodes and explicit difference between objects (lists of key-values) and arrays (lists of values).

link

djedr 1177 days ago

> With XML, the complexity is the baseline, and it only goes up from there. With JSON, the complexity is just an option, the baseline is pretty simple.

Very well put. And we could lower the baseline substantially towards simplicity, even from JSON.

It's pretty clear that a lot of people think this way. Some even seriously try to figure out what such a baseline of simplicity would look like.

There are lots of simple indentation-based designs (similar to YAML) such as NestedText[0], Tree Notation[1], StrictYAML[2], or even @Kuyawa's Dixy[3] linked in this thread.

There seem to be less new ideas based around nested brackets, the way S-expressions are. Over the years, I have developed a few in this space, most notably Jevko[4]. If there ever will be another lowering of the simplicity baseline, I believe something like Jevko is the most sensible next step.

[0] https://nestedtext.org/en/stable/ [1] https://treenotation.org/ [2] https://hitchdev.com/strictyaml/ [3] https://news.ycombinator.com/item?id=35469643 [4] https://jevko.org/

link

pointlessone 1177 days ago

I guess, it depends on how you define XML baseline. You can have a very simple XML with only bare tags. It will work just fine. Arguably, it's even simpler than JSON that way. A basic parser for that it probably not more complex than a JSON parser.

All the optional complexity that can go on top, though, is probably better specified for XML. Transformation is well defined for XML (XSLT) but not at all for JSON (I guess, you write your own code to manipulate native objects).

Schemas are basically a native feature for XML. Not so much for JSON.

All sorts of specialised vocabularies are defined for XML. A few are defined for JSON, too.

link

jsmith45 1177 days ago

For a lot of XML you need to be able to support XML namespacing, and doing that adds a lot of complexity over the original pure XML.

At first XML namespacing sounds simple. Each tag and attribute will have an optional uri attached to it, no big deal right?

From reading through the specification one could be forgiven from assuming that the prefixes are just arbitrary mappings that a processor can ignore, or automatically remap to alternate prefixes.

For example, it is true that <abc:a xmlns:abc="https://example.com/xyz" xmlns:def="https://example.com/xyz"><def:b>5</def:b></abc:a> (notice both namespaces are the same url) is equivalent to: <a xmlns="https://example.com/xyz"><b>5</b></a>.

Unfortunately, the data model also allows for content to reference the namespaces by prefix, and therefore every general xml processor that supports namespaces must keep around an application accessible mapping from the prefixes to namespaces, as the application may need to be able to access that information to interpret attributes or content. The only exception to this would be if the general XML processor insisted on having schema information for every namespace it might come across. In that scenario it would be able to tell if an attribute value of "abc:b" is really a string literal, or a reference to a namespace identifier (QNAME data type), where the namespace is whatever the current "abc" prefix is bound to, and the identifier is "b".

But obviously we don't want to add full schema support for a simple implementation, so we need to keep the mapping information around, just in case the application needs it. We also cannot easily offer nice features like changing a document to use preferred prefixes for certain namespaces, unless we also keep any prefixes that are used in values that could be interpreted as QNAMES, just in case they actually are, but our processor does not know, because it has omitted schema support for simplicity (or perhaps it included schema support, but does not have a schema available for some namespace).

And that is just the complexity that stems from one fairly small quirk in how XML works.

You also have no idea if an element content needs to preserve whitespace or not if you don't know the schema, and don't happen to have an xml:whitespace attribute present. Thus if you want to re-indent arbitrary xml for readability safely you could end up with something like this:

    <abc
        ><def
            >5</def
        ></abc
    >

link

pointlessone 1177 days ago

I understand what you're getting at but that is you choosing higher complexity baseline. Yes, it's a part of a standard but you can chose not to support it. No one said you have to support all of XML-verse in order to use it effectively in your particular application. The most common cases are usable without any of it. Look at most RSS/Atom feeds, XHTML, SVG. They all can get by with simple tags and and attributes.

I'm just not buying the argument that XML's complexity is somehow remediated in JSON. JSON becomes as horrible as XML when you bring it up to feature parity. And that's when there's a way to match features. Whatever people say about XSLT, it is powerful, reasonably well defined, and generic over all documents (even though complex). There's nothing like it for JSON I know of.

link

goto11 1177 days ago

If we are going for simplicity, surely S-expressions wins? You can support structures similar to JSON or XML on top of it, but the baseline is simpler.

link

mastax 1177 days ago

The new KiCad file formats are all S-expression based[0], except for the project files which are JSON IIRC. I think it works pretty well for representing a tree of typed objects textually. They don't even have any LISP connections. Haven't seen S-expressions used anywhere else, though.

[0]: https://dev-docs.kicad.org/en/file-formats/sexpr-intro/

link

twoodfin 1177 days ago

I’d speculate that human minds and memories work much better with associative structures rather than sequenced ones. JSON draws a clean separation between these two and as a result has clearer syntax for the former.

ie, the benefits of simplicity have a limit.

link