Hacker News new | ask | show | jobs
by int_19h 2759 days ago
The binary formats that immediately preceded the current OOXML were OLE compound files (aka "structured storage), which is basically a filesystem-in-a-file that's intended to be used for serialization purposes - to allow files to have arbitrary nesting of COM components.

Individual data structures are then binary-serialized into that compound file, but I don't think it's accurate to call it a "memory dump". People often get that impression after looking at the compound file and seeing garbage there, much like unused blocks in memory - but that's because it's a filesystem, and as such, has a concept of unused "sectors". This is also why a freshly saved binary Word file might still contain bits and pieces of old data and metadata.

1 comments

That's how the DOC format in the 90s and early 00s worked. I was under the impression that the early formats for Word on DOS, Windows and Macintosh were all different and represented in-memory structures written directly to disk. Unfortunately, I can't find a citation, so I may be mistaken.
There was certainly some format that Word and Excel used before compound files, since COM Structured Storage only appeared in 90s, and the first version of Office to use it was (IIRC) Office 97. That older format may well be some kind of a memory dump. But I don't think there are many Office files still floating around these days, and most third party software that works with them seems to assume that it's 97 or later.