Hacker News new | ask | show | jobs
by tptacek 5053 days ago
This makes sense, but from what I can tell, in virtually no major XML-based systems is the basis for XML files an underlying text extended with markup. Most XML systems, since the dawn of XML, have been top-to-bottom structured data.
4 comments

You're correct that almost no XML formats, with exception of XHTML, have an underlying text. This is the core problem, since the underlying information model inherited SGML presumes one. XML brings with it significant overhead for dealing with textual data, and most data isn't textual. Why is there an element vs tag debate? Wrong information model -- it's a difference without a distinction when you're doing data serialization. This is why XML was the shoe that never quite fit and why JSON will displace it so easily.

In 98/99, the XML bandwagon was something no one wanted to miss, it was the Web 2.0 and everyone knew it was the future. It was Java/WORA ("Write Once Run Anywhere") for data interchange and promised that you wouldn't be locked into a proprietary application. The marketing hype was simply outstanding. Even for technical people that hated XML itself, the promise of open formats was something you couldn't ignore and had to support even if you had to hold your nose. Open formats have since won -- holding your nose isn't needed anymore.

Now that the marketing hype of XML doesn't shut down the technical debate... JSON will soon dominate for data serialization tasks.

In the financial sector/mortgage/insurace, there are several major standards that are based on XML which is not top-down.
I may be talking out of my ass here, but isn't OO-XML and whatever Microsoft calls their new Office format exactly that?
You're right, I forgot about the "proprietary" document formats that got transformed into "open" (heh) XML formats. Good point. Thanks!
Both OO-XML and ODF are XML based. They both put the main XML document in a ZIP archive along with metadata and other resources.
With the almost unique exception of xhtml. But that's actually the only one I can think of.