|
|
|
|
|
by mdadm
3614 days ago
|
|
From your link: >Applies to: Office 2007 | Office 2010 | Open XML | Visual Studio Tools for Microsoft Office | Word | Word 2007 | Word 2010 I feel as though the gp comment is referring to far older versions, although without clarification, it's hard to be sure. |
|
FWIW, old Office documents were actually CFBF (Compound File Binary Format) files - think of it as FAT-in-a-file, allowing for multiple independent streams inside, with transactions. This was very commonly used on Windows in the OLE/COM era, because it was the underlying format for OLE Structured Storage. It's what allowed a Word document to embed another arbitrary document in an extensible way. The underlying data in the streams within CFBF was a loose object graph dump.
It all makes a lot of sense when you have your OLE glasses firmly on - it's basically a natural design that follows if your world consists of OLE objects and interactions between them. Look up IStorage and IStream to see what I mean.
The side effect of all this, however, is that the data inside an old Office file is not laid out in a logical way - streams consist of non-sequential interleaved blocks in a seemingly random order (depending on what was written when), some blocks may contain garbage data, and so on. So it's very difficult to reverse engineer, which is why it took so long back in the day, and the results were often unreliable.