Hacker News new | ask | show | jobs
by nine_k 151 days ago
XML grew from SGML (like HTML did), and it brought from it a bunch of things that are useless outside a markup language. Attributes were a bad idea. Entities were a so-so idea, which became unapologetically terrible when URLs and file references were allowed. CDATA was an interesting idea but an error-prone one, and likely it just did not belong.

OTOH namespaces, XSD, XSLT were great, modulo the noisy tags. XSLT was the first purely functional language that enjoyed mass adoption in the industry. (It was also homoiconic, like Lisp, amenable to metaprogramming.) Namespaces were a lifesaver when multiple XML documents from different sources had to be combined. XPath was also quite nice for querying.

XML is noisy because of the closing tags, but it also guarantees a level of integrity, and LZ-type compressors, even gzip, are excellent at compacting repeated strings.

Importantly, XML is a relatively human-friendly format. It has comments, requires no quoting, no commas between list items, etc.

Complexity killed XML. JSON was stupid simple, and thus contained far fewer footguns, which was a very welcome change. It was devised as a serialization format, a bit human-hostile, but mapped ideally to bag-of-named-values structures found in basically any modern language.

Now we see XML tools adopted to JSON: JSONSchema, JSONPath, etc. JSON5 (as used in e.g. VSCode) allows for comments, trailing commas and other creature comforts. With tools like that, and dovetailing tools like Pydantic, XML lost any practical edge over JSON it might ever have.

What's missing is a widespread replacement for XSLT. Could be a fun project.

9 comments

> XML grew from SGML (like HTML did), and it brought from it a bunch of things that are useless outside a markup language. Attributes were a bad idea.

That's exactly what I wanted to say. The author talks as if XML was well designed to represent structured data, but it was not, it grew out of the idea of marking up text, which is a completely different problem. The hilarious part is that he doesn't recognize the problem when he gives his example of "or with attributes".

The other thing, is that the JSON model doesn't just give you a free parser/serializer in JavaScript. It actually maps to the basic data model of the entire generation of dynamic languages that the Web grew on: perl, Python, JS, PHP and Ruby. Arrays and maps are the basic way to represent structured data in these languages, and JSON just serializes that. Which means that getting data in and out of your language is just a single line.

The author seems to think that XML maps a proper conceptual model and JSON doesn't, but the model of "nodes with attributes and content" is a worse match for structured data than JSON's model of "arrays and maps of values".

Other than that, it's really a question of how much tooling you want to use. Both JSON and XML grew entire ecosystems of it, and nowadays if you want to read your JSON according to a schema into typed objects, you can, and for any good-sized project, you probably should.

Also: > There are cases where other formats are appropriate: small data transfers between cooperating services and scenarios where schema validation would be overkill.

That's actually most of the cases for your average web dev!

> and it brought from it a bunch of things that are useless outside a markup language

It is a markup language. The mistake was trying to use it for anything else.

amen
XSLT ended at 1.1 for me. Everything that was "designed-by-committee" later was subverted to serve the bottom line of Michael Kay enterprises, although I hesitate to attribute to malice the grueling incompetence of the working group at the time.
Don’t forget the whole DOM vs SAX processing mess. Big documents would routinely kill parsers by running out of Memory.

XSLT was cool. Too bad XSL and Apache-FOP never took off.

It still works well in the appropriate settings. LibreOffice (nee OpenOffice) uses ODF, an XML format, for its document files, and it has been working nicely enough for a long time.
MS Office's own XSLX and DOCX formats are trees of XML files, zipped.
If-I-Recall-Correctly, it was typically a 10x memory load to open an XML file in a DOM parser. Which could get really ugly, really fast when you were dealing with many files.
> What's missing is a widespread replacement for XSLT

jq says hello!

I really like Clojure EDN. Its very simply, but adds just enough on-top that make a difference. Namespaces, a few more types and a way to add costume stuff in a reasonable standard way.
> OTOH namespaces, XSD, XSLT were great

I don't know, the few times I have had to XML, I went "This is not so bad, I don't know what all the fuss is about" until I hit namespaces. I don't know if I was just using an inferior library but namespaces sucked. The minute namespaces came into the picture all the joy left the project. And XSLT... I only ever did one thing with it "use the browser to turn demarc XML records into a webpage" and that was pretty cool. but it also firmly convinced me that XML is very much the wrong form factor for a programing language.

My personal thought is that css is not a sgml-like as a sort of rebellion against the way XML was taking over the world. It feels like author had written one too many XSLT's and said "Nope, it ends here, we are not doing that again." Because really, it is very weird that css does not use an XML syntax.

On the topic of the wrong form factor for a programing language. Another good contender is ansible when you try to use it's YAML looping constructs.

CSS predates XML.
> XML grew from SGML […]

… as an effort to simplify SGML which was deemed to be too complex.

Oh, the irony.

> Complexity killed XML. JSON was stupid simple

I say "the ditt-ka-pow" for The Dumbest Thing That Could Possibly Work (DTTCPW).