Hacker News new | ask | show | jobs
by technicolorwhat 1797 days ago
I work with XML quite a lot for connecting legacy but also 'modern' systems, and there are some good parts and some absolutely horrible parts, let me go on a rant for a little.

I classify them as sane XML ideas and some as complete utter garbage.

Good parts

- Schemas for documents - If you have a single schema file, and a single XML to do validation its KINDA nice. If you limit yourself to a basic set of XSD types its okay ish. - The general idea have XSD's express document structure and have a standard for it is kinda nice.

Bad parts

- people go absolutely bonkers with schemas, have schemas in schemas, do complicated stuff that can hardly be expressed in modern type systems, and even harder to generate code for. Things like PEPPOL or UBL are extreme horrible overly complex solutions, and you need an army of devs that have the patience to implement it properly.

As a result, everybody does the bare minimum and gets their own flavour and you're still making per supplier code since everyone just tries to make it work. Our code base has a ton of 'fixers' per supplier to convert their iffy files to something sane. We started by asking them to fix their files but in the end it was just waaaaaay faster if we dealt with it, because the complexity of truly understanding XSD's is quite complicated.

- The a part of the standards community, feels more like schema designers than actually implementers. It feels like whenever a committee designs a standard they're like "WE'RE DONE!" without actually implementing it themselves across languages and see how hard actually is to generate something valid. These organisations are head-acheinducing money sinks where there bad schema ideas trickle down.

- It can basically do too much, what you want is a tiny expression of your types.

- JSON Interopability: You can't just convert JSON to XML without schema knowledge. In XML this can `<user><name></name></user>` can either be an item in an array [], or a hash. You need to know the schema to understand how to deserialise or generate properly, you can't cheat.

- Its super verbose, and hard to read. When you're staring at 200 lines of XML, you're brain has to parse all the verbose namespaces and what not in order to pick out the data.

I also deal with +1GB XML files and then the tooling to support large files grows thin. And you resolve to SAX parsing and emitting nodes. It aint fun. In the end I had to write a lot of tools myself to make it manageable to work with many documents.

Anyway thanks for reading this far, I really needed to vent hahaha

2 comments

> - Its super verbose, and hard to read. When you're staring at 200 lines of XML, you're brain has to parse all the verbose namespaces and what not in order to pick out the data.

Namespaces are high on my list of decent IDs with botched implementations killing the concept. The thing which made working with them so painful was that the tools pushed all of their work off to the user and did inconsistently.

It's not just that you have to know they exist but you have to know that, for example, you're going to need to pass namespace declarations to many APIs because the implementers chose to require those as parameters rather than reading the declaration out of the document being parsed and depending on which tools you have the misfortune of using you might have a source like “myns:foo” but the parser will require you to use "{http://example.org/myns}foo" for queries. Similarly, the default could be inherited so you could simply write '<myns:foo bar="baaz"><quux /></myns:foo>' and assume that everything unqualified is in “myns”.

Imagine how much better this would have been if, a couple of decades ago, anyone had cared about the developer experience enough to fix the common tools. Non-broken namespacing, good error messages, easy to use validators and formatters, etc. would have removed so much constant friction which lead everyone to run for the door as soon as JSON started to get momentum.

> - The a part of the standards community, feels more like schema designers than actually implementers. It feels like whenever a committee designs a standard they're like "WE'RE DONE!" without actually implementing it themselves across languages and see how hard actually is to generate something valid. These organisations are head-acheinducing money sinks where there bad schema ideas trickle down.

Imagine if 10% of the money which went into barely-used standards development had gone into maintaining libxml2 (and maybe xerces). Basically any bit of work which went into XPath or related specs after 1999 was a write-off because those later versions effectively never shipped for anyone outside of exclusive .NET users.

In my experience, schemas get crazy because people convert their internal OO data structures directly into XML schemas instead of designing a proper API.