Hacker News new | ask | show | jobs
by clarkevans 5053 days ago
I think this article has done a great job enumerating trends that show JSON is beating XML for data serialization applications. I think these points are evidence of a shift in thinking, but not the reason for shift itself.

Why JSON over XML? Because people need data serialization format and XML is a Markup Language. JSON is gaining widespread adoption for data serialization applications since it's the correct tool. XML isn't.

In a markup language there is an underlying text that you're annotating with machine readable tags. Most data interchange doesn't have an underlying text -- you can't strip away the tags and expect it to be understandable. If you're writing a web page, that has to be read by humans and interpreted by machines... you need a markup language.

By contrast, data interchange is about moving arbitrary data structures between processes and/or languages. JSON's information model fits this model perfectly: its nested map/list/scalar is simple & powerful. As for typing, it found a sweet spot with text/numeric/boolean.

JSON is the right tool for the data serialization problem.

3 comments

This makes sense, but from what I can tell, in virtually no major XML-based systems is the basis for XML files an underlying text extended with markup. Most XML systems, since the dawn of XML, have been top-to-bottom structured data.
You're correct that almost no XML formats, with exception of XHTML, have an underlying text. This is the core problem, since the underlying information model inherited SGML presumes one. XML brings with it significant overhead for dealing with textual data, and most data isn't textual. Why is there an element vs tag debate? Wrong information model -- it's a difference without a distinction when you're doing data serialization. This is why XML was the shoe that never quite fit and why JSON will displace it so easily.

In 98/99, the XML bandwagon was something no one wanted to miss, it was the Web 2.0 and everyone knew it was the future. It was Java/WORA ("Write Once Run Anywhere") for data interchange and promised that you wouldn't be locked into a proprietary application. The marketing hype was simply outstanding. Even for technical people that hated XML itself, the promise of open formats was something you couldn't ignore and had to support even if you had to hold your nose. Open formats have since won -- holding your nose isn't needed anymore.

Now that the marketing hype of XML doesn't shut down the technical debate... JSON will soon dominate for data serialization tasks.

In the financial sector/mortgage/insurace, there are several major standards that are based on XML which is not top-down.
I may be talking out of my ass here, but isn't OO-XML and whatever Microsoft calls their new Office format exactly that?
You're right, I forgot about the "proprietary" document formats that got transformed into "open" (heh) XML formats. Good point. Thanks!
Both OO-XML and ODF are XML based. They both put the main XML document in a ZIP archive along with metadata and other resources.
With the almost unique exception of xhtml. But that's actually the only one I can think of.
ON vs ML, nice

But there's also the stack. XML has XSD for validation and documentation; XSLT and XQuery for transformation; and most people seem to like XPath. The overwhelming response to analogues for JSON is horror - don't pollute our simplicity! - and acknowledgement that while some tasks do indeed need these features, the XML stack already has them. The corruption of XML is what keeps JSON clean.

It also sounds like the right tool for making something like SVG. The vast majority of SVG data isn't text.
In SGML land, you wouldn't have implemented SVG using tags, you would have created a NOTATION with syntax specific to the problem at hand. In XML land, everything is XML; for example, schemas are XML (in SGML they are DTDs, that are _not_ SGML), transforms are in XML (in SGML, they are DSSSL, a lisp variant). The XML approach is one-size-fits-all, not the use-the-best-tool/syntax for the job.