Hacker News new | ask | show | jobs
by flohofwoe 339 days ago
I don't even think it's intentional, they had to come up with a file format which supports all the weird historical artefacts in the various Office tools. They didn't have the luxury to first come up with a clean file format and then write the tools around it.

And I bet they didn't switch to XML because it was superior to their old file formats, but simply because of the unbelievable XML hype that existed for a short time in the late 1990s and early 2000s.

5 comments

An XML format, even one with a lot of cruft to handle legacy complexity, is absolutely easier to parse/interop with than a legacy binary format that was to a large degree a serialization of undocumented in-memory content.

OOXML was, if anything, an attempt to get ahead of requirements to have a documented interoperable format. I believe it was a consequence of legal settlements with the US or EU but am too tired at the moment to look up sources proving that.

> is absolutely easier to parse/interop with than a legacy binary format

depends

you can have well, clean and fully documented binary formats which are relatively easy to parse (e.g. msgpack, cbor, bson)

you might still not know what the parsed things mean, but that also applies to text formats (including random documented binary blob fields, thanks to base64 they also fit into any text format)

Exactly, there is no need for nefarious intentions, when time constraint et mild incompetence suffice.

The OOXML format is likely a not very deeply thought out XML serialization of the in memory structure or of the old binary format, done under time pressure (there was legal pressure on Microsoft at the time).

> The OOXML format is likely a not very deeply thought out XML serialization of the in memory structure or of the old binary format

it somewhat looks like that, but that old binary format changed with every nth yearly major new version and IMHO it looks like not being far away from a slightly serialized dump of their internal app data structures ;)

but

putting aside that they initially managed to incorrectly implement their own standard OOXML and the mess that "accident" caused

they also did support import and even exports (with limited features) of the Open Document format before even fully supporting OOXML, and even use that as standard save option.... (when edition such a document)

like there really was no technical reason why they couldn't just have adopted the Open Document format, maybe at worst with some "custom" (but open and "standardized" (by MS itself) extensions to it)

MS at the time had all insensitive to comply as bad in faith as they could get away with

and what we saw at that time was looking like exactly that

sure hidden behind "accidents" and incompetence

but lets be honest if a company has all interest and insensitive to make something in bad faith and make it go bad absurdly and then exactly that happens then it's very naive to assume that it was actually accidentally most likely it wasn't

that doesn't mean any programmer sat down and intentional thought about how to make it extra complicated, there is no need for that and that would just be a liability, instead you do bad management decision, like (human) resource starve the team responsible (especially keep you best seniors away), give them all messed up deadlines, give them all messed up requirements you know can't work out. Mess up communication channels. Only give them bad tooling for the task. etc. etc. Most funny thing due to how messy software production often is the engines involved might not even notice ;), means no liability on that side.

and they should have gotten the corporate death penalty for it. I think it should still be done. the sheer amount of crap microsoft has purposely bestowed upon the world should lead to life in prison for many of its decision makers
This format of XML in a zip with a docx extension came into existence in Office 2007
> They didn't have the luxury to first come up with a clean file format and then write the tools around it.

This is just not right.

They where not required (AFIK) and in some edge cases also didn't provide a perfect conversion of all old documents to the open format. Actually even just converting between different versions of their proprietary formats had a tendency to break things sometimes! (back then)

> unbelievable XML hype that existed for a short time in the late 1990s and early 2000s.

(EDIT: actually 2006, so uh, maybe XML hype) we speak about ~2010, the hype was pretty dead again at that time, and the main reason they choose it is to position it as "completion" to emerging standardized open office document formats which all used XML as markup language (except they don't really use XML as mark down language but more like serialization to JSON but way more complex, but that doesn't matter they mostly need to convince not supper tech affine people about them "no longer trying to hamper competition" to preclude legislative action and governments from switching to other office suites due to the closed format making them worry).

so they where more then able to

- do a clean design, if anyway a lot of old "proprietary" documents break subtly when converting it doesn't matter (and they did break)

- just adopt OpenDocument format

Sorry but XML is a good fit for this. Most people who've never used XML cannot ever fathom that it does actually do a number of things well.

Being able to layer markup with text before, inside elements, and after is especially important --- as anyone with HTML knowledge should know. Being able to namespace things so, you know, that OLE widget you pulled into your documents continue to work? Even more important. And that third-party compiled plugin your company uses for some obscure thing? Guess what. Its metadata gets correctly embedded and saved also, and in a way that is forward and backwards compatible with tooling that does not have said plugin installed.

So no, it wasn't 'hype'.

There are good use cases for XML.

There was also huge hype. XML databases, anyone? XML is now an also-ran next to json, yaml, markdown. At the time, it was XML all the things!