Hacker News new | ask | show | jobs
by jimfuller 4940 days ago
XML, being in the markup family tree, has a lot more history then simple json encoding ... measuring its usefulness on a corner case has always been well ... boring. I am glad people are using JSON to sling simple data across the web versus markup.

Come back to me when you are using json to encode an entire document ... you might look at XML a bit differently.

tl;dr use the right tool for the right job.

2 comments

Yes,

But I would claim that by being designed for many jobs in a sloppy way, XML became a terrible tool for all jobs.

Json is a good replacement for XML in some of the application for which some folks foolishly targeted XML (I worked on a server back in the day that really did process five times the data 'cause of our use of XML for interchange - as was the new standard at the time, remember "XML everywhere!"?).

Html is a good tool for web documents (who would have thought?) but you're right, XML fills in for a lot of other document uses - XML is so far the best generalization that splits the difference between a word document and an HTML file. But I suspect the celebrate if someone could put forward a better such generalization because even in the realms where XML is the best tool available, it is a bad tool.

Perhaps if more people admitted the awful attribute/value ambiguity problem that the article very intelligently calls-out, the use of XML would be less painful. If we called it "Inconsistently Structured Data Intermingling Format" ISDIF, the young developers would have some idea what they were getting into.

'XML became terrible for all jobs' ... you do realize that there are literally hundreds of billions of xml encoded documents out there, happily doing what they are supposed to do ... not trolling; but lets put some of the comments in context for what they are.

XML in its original role of extensible markup is thriving and completely successful ... as I previously mentioned, I am glad we are not slinging around angle brackets and happily use JSON instead.

I agree that XML was hijacked during its hype cycle to do a lot of jobs it should have never been intended to do ... AJAX (see the X) was a side effect of this, and we moved on to AJAJ ... evolution sometimes needs different routes to get out of local maxima.

Note also that there is a very long tail of XML vocabularies that you will never use or hear about that get extended and reflect their authors intents, w/o nary an agreement required between you (or I) to get real work done.

HTML5 though is where I have the problem (in terms of XML Failure)... baked in controlled vocabularies ... hmmm, what happens when your tag (or attribute) du jour doesn't pass muster with the WHATWG ?

I console myself by saying that both XML and HTML5 are part of the same markup family, just a short term family dispute for the time being; never bet against markup (or data for that matter) as they tend to stick around a lot longer then the programming languages that generated them.

The lack of user-defined tags and attributes (ignoring data-* for a minute because it's different) is what makes HTML such a more pleasant document/markup language than XML.

XML schemas are hard because allowing people to define an ad-hoc ordered hierarchical parsing structure is hard, so most people don't do the schema part (or ignore the schema in the real world) resulting in ambiguity in the rules about what sort of constructs are allowed in the document or what they mean, resulting in XML formats that aren't really interchangeable.

HTML5 relieves this by having only one markup format that's actually specified with a real common understanding instead of a multitude of formal and informal markups. Evidence that the world needs more than one markup format is thin on the ground, what most people need is the ability to locally distinguish between and identify things, and class and id are complete and minimal for that job.

the lack of being able to extend or define my own tags and attributes feels like a reduction in freedom to me ... but I guess it depends on your perspective. If you feel comfortable with browser companies and a small handful of ppl defining a rigid vocabulary for the world to use, that is your option; me ... I don't feel comfortable with that situation.

XML schemas are hard ... the xml suite of technologies certainly did not get it right the first time around, but slowly those 'hard' technoloiges get eaiser to use (either because of tooling or in XML Schema case we now have v1.1 with things like assertions that make validation a lot more useful and easier). In the meantime, I am certain that we will see every XML technology eventually will get regenerated and replicated within the JSON stack of tech.

Lastly, I agree that having a sanctioned vocabulary called HTML5 is a 'good thing' ... however saying that the world only needs one markup format is plainly incorrect, its akin to saying we only need a single spoken language; sure a lingua franca would make life (and processes) a magnitude simpler but hoping for this situation to actually occur is a real 'pipe' dream.

IMO, if a 'fallacies of data' type edict was to be handed down, then 'planning for homogeneity (lingu franca)' would be in the top 7.

It's a shame AJAJ is harder to pronounce than AJAX. We need a new term CEOs can learn to say. Then when they start demanding we use this latest technology, we'll be ready.
Feel free to excise XML from any buzzword by claiming the X doesn't stand for XML, it's a free variable. "Our X is JSON"
'XML became terrible for all jobs' ... you do realize that there are literally hundreds of billions of xml encoded documents out there, happily doing what they are supposed to do ... not trolling; but lets put some of the comments in context for what they are.

I sure do. I perhaps should have conceded that once you get up to the level of Large Enterprise Monstrosities (LEMs), you have something almost by definition "terrible" (or at least messy from the viewpoint of smaller, more coherently architected systems) and thus in that situation, I might not, maybe, have any basis for criticism.

But still, one thing I'd speculate is, that that one attribute/value ambiguity problem just might be sooo bad that even in the realm of whatever-monstrosities-that-have-meld-together-messy-stuff, XML would do better replaced by a different whatever-monstrosity.

hehe, I like the denotation LEMs ...
"Come back to me when you are using json to encode an entire document"

How is that relevant to the article, which is about COS? In other words, what does COS lack what XML has?

* Infrastructure (schema support - DTD, Schema, RelaxNG; transformation - XSLT) * No obvious document format (What encoding are the strings? How to escape characters? * Only used to describe predefined object types (boolean, strings, arrays, dictionaries * Hard to ensure the integrity of the data without interpreting the data from the interpreter itself (no external validation)
You know this can be done on top when you have demand for this? I prefer a non-bloated protocol format over XML anytime. How often does the DTD not matter at all ? How often is the encoding fixed by convention ? ...
I use schemas (in the form of RelaxNG most of the time) almost every time I deal with XML. Together with schematron you can make very complex lint-like scripts to verify your data. Actually I program in XML with a self created programming language (formulated in XML). This together with RelaxNG and a good XML makes it fun writing XML and absolutely (syntax-)error free.
I could just as similarly ask why the author even mentions XML ?