Hacker News new | ask | show | jobs
by klodolph 2431 days ago
I didn’t understand the XML hate either. It was just a bit annoying to parse, depending on the language and ecosystem you used. It was a little verbose, but so what?
1 comments

The biggest problem to me is that XML is not a data serialization language, it's a document markup language. In documents, the distinction between attributes and content makes sense. In data serialization, the choice of whether a given datum is an attribute or a text content appears rather arbitrary. Should I write this?

  <book>
    <title>XML Cookbook</title>
    <author>Jane Doe</author>
  </book>
Or this?

  <book title="XML Cookbook" author="John Doe" />
Now attributes don't work when there are multiple values, so I guess I should use attributes for single values and child nodes for lists:

  <book title="XML Cookbook">
    <author>Jane Doe</author>
    <author>Tim Pickens</author>
  </book>
But that rule also has problems. If I decide to include markup in the title, it suddenly needs to be a child node again:

  <book>
    <title>The <strong>Awesome</strong> <abbr>XML</abbr> Cookbook</title>
    <author>Jane Doe</author>
    <author>Tim Pickens</author>
  </book>
Also, "author" is a misleading name for a field that is actually an array, so should I actually use an "authors" node to make that clearer?

  <book>
    <title>XML Cookbook</title>
    <authors>
      <author>Jane Doe</author>
      <author>Tim Pickens</author>
    </authors>
  </book>
Or maybe:

  <book>
    <title>XML Cookbook</title>
    <authors>
      <person name="Jane Doe" />
      <person name="Tim Pickens />
    </authors>
  </book>
Now compare to this to YAML:

  book:
    title: XML cookbook
    authors:
      - name: Jane Doe
      - name: Tim Pickens
Or even just:

  book:
    title: XML cookbook
    authors: [ Jane Doe, Tim Pickens ]
I need to make way fewer design choices when writing that down. In fact, I probably don't need to design anything since that's already the data structure that I've written down as a type somewhere in my code. That's why it's a good idea to use a data serialization language for, well, data serialization.
Thank you for articulating this, but I’m familiar with these complaints. XML does give you a lot of freedom to format your data in different ways, which can get you into traps. I’ve run into those traps before, like the decision between attributes and child nodes.

This doesn’t add up to XML hate, for me. The way I would probably write the document is:

  <book>
    <title>XML Cookbook</title>
    <author>Jane Doe</author>
    <author>Tim Pickens</author>
  </book>
This is a fairly boring way to write out a document and while you can bikeshed all you want, I don’t see the possible bikeshedding as a major drawback. The above is concise and easy to understand.

I wouldn’t use YAML as a basis for comparison. YAML has a fair number of oddities and inconsistencies that led me to stay away from it. XML is at least consistent and simple, there are not really any surprises to speak of and there are plenty of tools for modifying XML documents even when you don’t have the schema. For YAML, although there’s a spec, it’s complicated enough that different implementations are inconsistent with each other and there seems to be some inertia at work here.

There’s also the downright bizarre set of regexes that YAML uses to recognize bare strings as other types, that means that '3.3.0' is a string, but '3.3' is a number. If I write 'ni' that’s a string but 'no' is a boolean. I personally find it harder to read or author YAML due to all these rules. You also have to be a bit more careful to sanitize YAML input due to things like the way !! is handled by various libraries, or the way YAML allows object cycles. It gives you too much rope to hang yourself, has too many surprises, and too many footguns. The fact that YAML is a bit more concise just isn’t enough of an advantage.

    # Quiz: What value does this give you when parsed?
    MAC Address: 11:02:03:04:05:06
For data serialization, I would stick to something like Protocol Buffers. You get a text and binary format, a schema, consistency across implementations, and good tooling.

XML is workable in a lot of situations and in some cases the verbosity makes it a bit more self-documenting than e.g. JSON.

TOML would be my choice for config files that I maintain.

I've grown to like Avro, mostly because of its ability to support schema evolution for reader and writer independently. You get the usual niceties around binary wire format, schema, dynamic parsing and/or code generators etc.
Thank you... this pretty much sums up most of my disgust regarding XML in general. And while JSON is more universal, YAML is much more accessible for humans.