Hacker News new | ask | show | jobs
by cwyers 3246 days ago
> which are intentionally bloated with useless XML contents to make interoperability almost impossible.

That's just a conspiracy theory. The reason they're "bloated" is because Microsoft Office is optimizing for interoperability with its largest competitor: older versions of Microsoft Office.

Maybe Microsoft Office having "cleaner" XML would improve interoperability. But as long as Office is the standard, the ability to consume messy XML is worth more than the ability to emit clean XML.

3 comments

> That's just a conspiracy theory.

The "conspiracy" is exceedingly well documented, as others have already noted.

The motherlode is these pages of contemporary documents at Groklaw:

http://www.groklaw.net/staticpages/index.php?page=2005121615...

http://www.groklaw.net/staticpages/index.php?page=2008071923...

A good starting document is "Can Other Vendors Implement Microsoft's Office Open XML?" http://web.archive.org/web/20070912014933/http://www.hollowa...

> A good starting document is "Can Other Vendors Implement Microsoft's Office Open XML?" http://web.archive.org/web/20070912014933/http://www.hollowa....

Then let's start there! Let's start with the first section about Word Processing, in fact:

``` 1.1. Historical Compatibility

OOXML contains compatibility markers to describe older legacy documents, their quirks and processing models. These compatibility features mark behaviours that software must implement to correctly display and process the majority of documents in existence.

The "Compability Settings" WordProcessingML4 section within OOXML does not provide for repeatable practices. While it provides Microsoft the ability to store information related to various behaviors in their legacy file formats, the specification merely lists the names of these settings without proper definitions. An OOXML-consuming application, presented with a document using these attributes, will be unable to interpret them properly and render the page in a high-fidelity manner. Further, since these attributes are merely listed but not defined, the ability to practice the benefit of being “fully compatible with the large existing investments in Microsoft Office documents” (the goal of OOXML according to its authors) is consequently reserved for Microsoft alone.

These behaviours such as “autoSpaceLikeWord95” , “useWord97LineBreakRules” and “useWord2002TableStyleRules” are not defined. As OOXML repeatedly states, [t]o faithfully replicate this behavior, applications must imitate the behavior of that application, which involves many possible behaviors and cannot be faithfully placed into narrative for this Office Open XML Standard.

These processing hints in the proposed standard depend on undisclosed information, and therefore other vendors cannot correctly process historical documents using OOXML. This lack of specification has significant implications for the New Zealand public sector organisations operating under the Public Records Act who are seeking to preserve documents of their records in a readable electronic form. ```

I think that rather supports the claim that backwards compatibility is a problem for OOXML. I see no claim in there about any deliberate obfuscation.

> These behaviours such as “autoSpaceLikeWord95” , “useWord97LineBreakRules” and “useWord2002TableStyleRules” are not defined.

Amazing that these things can be part of a standard, it even has the propietary names(Word97, Word2002), it's just pure malice to propose that horrible software functionality as a standard. I guess the 6k pages really made it hard to properly review it.

You are mistaken. The tricks they used to get OOXML standardized[1] leave no room for doubt that they have been intentionally making interoperability harder.

[1]: Wikipedia has an article about that: https://en.wikipedia.org/wiki/Standardization_of_Office_Open....

You've linked to Wikipedia but there is no evidence there supporting your claim. In fact the criticism section includes pro-ODF supporters claiming the exact opposite:

> The ODF Alliance UK Action Group has stated that [...] the Office Open XML file-format is heavily based on Microsoft's own Office applications and is thus not vendor-neutral

If you've ever seen the specs for the old, binary office formats (they can be obtained) then you'd know that they are very complex indeed and that their OOXML siblings are pretty much direct encodings of the same data structures with some adaptations for the limits of XML. There is no credit to the argument that Microsoft deliberately made the OOXML office formats complex compared with the existing binary formats.

It's true that Microsoft pushed hard to get OOXML through the ISO, but the reason for that is clear: they wanted an open standard that was 100% compatible with existing Office documents. Something like ODF which lacks many of the features of Office would not do. It also makes their developer's lives a lot easier if they can specify a standard which describes their software's current behaviour. This is exactly what what Adobe did with the PDF ISO standard (1000+ pages) and nobody complains about that.

You are replying to a claim I did not make, that OOXML was not based on Microsoft Office XML or that the XML formats were made more complex than the binary formats.
Anecdotally, Libreoffice has better compatibility with older versions of MS Office than current versions of MS Office, so I'm not sure you have any more evidence for "the reason" than the person you're replying to has for the "conspiracy theory."