I didn’t understand the XML hate either. It was just a bit annoying to parse, depending on the language and ecosystem you used. It was a little verbose, but so what?
The biggest problem to me is that XML is not a data serialization language, it's a document markup language. In documents, the distinction between attributes and content makes sense. In data serialization, the choice of whether a given datum is an attribute or a text content appears rather arbitrary. Should I write this?
book:
title: XML cookbook
authors:
- name: Jane Doe
- name: Tim Pickens
Or even just:
book:
title: XML cookbook
authors: [ Jane Doe, Tim Pickens ]
I need to make way fewer design choices when writing that down. In fact, I probably don't need to design anything since that's already the data structure that I've written down as a type somewhere in my code. That's why it's a good idea to use a data serialization language for, well, data serialization.
Thank you for articulating this, but I’m familiar with these complaints. XML does give you a lot of freedom to format your data in different ways, which can get you into traps. I’ve run into those traps before, like the decision between attributes and child nodes.
This doesn’t add up to XML hate, for me. The way I would probably write the document is:
This is a fairly boring way to write out a document and while you can bikeshed all you want, I don’t see the possible bikeshedding as a major drawback. The above is concise and easy to understand.
I wouldn’t use YAML as a basis for comparison. YAML has a fair number of oddities and inconsistencies that led me to stay away from it. XML is at least consistent and simple, there are not really any surprises to speak of and there are plenty of tools for modifying XML documents even when you don’t have the schema. For YAML, although there’s a spec, it’s complicated enough that different implementations are inconsistent with each other and there seems to be some inertia at work here.
There’s also the downright bizarre set of regexes that YAML uses to recognize bare strings as other types, that means that '3.3.0' is a string, but '3.3' is a number. If I write 'ni' that’s a string but 'no' is a boolean. I personally find it harder to read or author YAML due to all these rules. You also have to be a bit more careful to sanitize YAML input due to things like the way !! is handled by various libraries, or the way YAML allows object cycles. It gives you too much rope to hang yourself, has too many surprises, and too many footguns. The fact that YAML is a bit more concise just isn’t enough of an advantage.
# Quiz: What value does this give you when parsed?
MAC Address: 11:02:03:04:05:06
For data serialization, I would stick to something like Protocol Buffers. You get a text and binary format, a schema, consistency across implementations, and good tooling.
XML is workable in a lot of situations and in some cases the verbosity makes it a bit more self-documenting than e.g. JSON.
TOML would be my choice for config files that I maintain.
I've grown to like Avro, mostly because of its ability to support schema evolution for reader and writer independently. You get the usual niceties around binary wire format, schema, dynamic parsing and/or code generators etc.
Thank you... this pretty much sums up most of my disgust regarding XML in general. And while JSON is more universal, YAML is much more accessible for humans.