| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by nine_k 151 days ago

XML grew from SGML (like HTML did), and it brought from it a bunch of things that are useless outside a markup language. Attributes were a bad idea. Entities were a so-so idea, which became unapologetically terrible when URLs and file references were allowed. CDATA was an interesting idea but an error-prone one, and likely it just did not belong.

OTOH namespaces, XSD, XSLT were great, modulo the noisy tags. XSLT was the first purely functional language that enjoyed mass adoption in the industry. (It was also homoiconic, like Lisp, amenable to metaprogramming.) Namespaces were a lifesaver when multiple XML documents from different sources had to be combined. XPath was also quite nice for querying.

XML is noisy because of the closing tags, but it also guarantees a level of integrity, and LZ-type compressors, even gzip, are excellent at compacting repeated strings.

Importantly, XML is a relatively human-friendly format. It has comments, requires no quoting, no commas between list items, etc.

Complexity killed XML. JSON was stupid simple, and thus contained far fewer footguns, which was a very welcome change. It was devised as a serialization format, a bit human-hostile, but mapped ideally to bag-of-named-values structures found in basically any modern language.

Now we see XML tools adopted to JSON: JSONSchema, JSONPath, etc. JSON5 (as used in e.g. VSCode) allows for comments, trailing commas and other creature comforts. With tools like that, and dovetailing tools like Pydantic, XML lost any practical edge over JSON it might ever have.

What's missing is a widespread replacement for XSLT. Could be a fun project.

9 comments

downsplat 150 days ago

> XML grew from SGML (like HTML did), and it brought from it a bunch of things that are useless outside a markup language. Attributes were a bad idea.

That's exactly what I wanted to say. The author talks as if XML was well designed to represent structured data, but it was not, it grew out of the idea of marking up text, which is a completely different problem. The hilarious part is that he doesn't recognize the problem when he gives his example of "or with attributes".

The other thing, is that the JSON model doesn't just give you a free parser/serializer in JavaScript. It actually maps to the basic data model of the entire generation of dynamic languages that the Web grew on: perl, Python, JS, PHP and Ruby. Arrays and maps are the basic way to represent structured data in these languages, and JSON just serializes that. Which means that getting data in and out of your language is just a single line.

The author seems to think that XML maps a proper conceptual model and JSON doesn't, but the model of "nodes with attributes and content" is a worse match for structured data than JSON's model of "arrays and maps of values".

Other than that, it's really a question of how much tooling you want to use. Both JSON and XML grew entire ecosystems of it, and nowadays if you want to read your JSON according to a schema into typed objects, you can, and for any good-sized project, you probably should.

Also: > There are cases where other formats are appropriate: small data transfers between cooperating services and scenarios where schema validation would be overkill.

That's actually most of the cases for your average web dev!