Hacker News new | ask | show | jobs
by gaius 3165 days ago
Nope, newer MS Office products write files as a fully documented XML format.

Contrary to popular belief MS never deliberately obfuscated Office files; they were simply a raw dump of the internal objects as they were in memory. So obviously they would change between versions. Which made sense in the days when computers were a lot more resource constrained than they are now. Now with power and memory to spare, they can be encoded and decoded on the fly.

3 comments

And that xml file is anything but structured in a plain and simple way. To understand it and all it does, it’s be impossible to accomplish in a way that people didn’t load a given file in excel and diff the two. Same for OpenOffice.
I wonder how much that has to do with excessive amounts of backwards compatibility for existing .xls and .doc files. I imagine a fresh implementation with no respect paid to existing sheets would be much better behaved. Anything that stands out to you as overtly strange in the format worth drawing attention to?
Office is a super giant set of products with a million function, years of versions each their own set of bugs, and decades of files of every kind and subset of features produced by users. I personally have almost never seen another word processing app, for example, 100% successfully load in a non-trivial document from a competitor. It’s just really hard, especially if the features don’t map 1:1, let alone fonts, model of layout, etc. Just the shear number of rendering quirks, subtle differences in math formula implementations, etc would be mind boggling. It’s hard to imagine a more insane (and boring) job.

Kudos to Google for getting such impressive compatibility. Must have been insane amounts of man-hours to achieve. They at least have the ability to crawl the web, download docs and xls, and automate comparison. I’m not sure it’s even feasible otherwise.

Maybe some of it is that, but it was released amid an activist push for governments and public agencies to be required to use an open file format such as ODF[1]. So, Microsoft's XML format is open, but it may still not be great. From what I understand that's pretty easy to do with XML.

1. https://en.wikipedia.org/wiki/OpenDocument_adoption#United_S...

"excessive" to you is "core user need" to a product manager
OpenXML is not fully open standard, unlike OpenDocument. So they obfuscated even the modern one. There are many parts that ambiguously documented or not documented at all.

http://www.decalage.info/files/JCV07_Lagadec_OpenDocument_Op...

As a former microsoft employee, I disagree. They certainly were happy that it was hard for other people to read and write office files. It was a feature, not a happy accident.