|
|
|
|
|
by paultopia
2057 days ago
|
|
I'm a little bit confused by this. Isn't the modern docx format just a bunch of XML markup in a zip file? Actually, I'm sure the modern docx format is just a bunch of XML markup. I just created a toy docx with the text "This is a test." and ripped it open with a little bit of python that I had lying around from previous experiments along those lines[1] Looking at the output of the file 'word/document.xml', in relevant part, we see: <w:body>
<w:p w14:paraId="64E164D6" w14:textId="77777777" w:rsidR="00EB525B" w:rsidRPr="00E02EE2" w:rsidRDefault="00E02EE2">
<w:r>
<w:t xml:space="preserve">This </w:t>
</w:r>
<w:r>
<w:rPr>
<w:i/>
<w:iCs/>
</w:rPr>
<w:t>is</w:t>
</w:r>
<w:r>
<w:t xml:space="preserve"> a test.</w:t>
which looks like the underlying XML representation indeed intersperses formatting codes in the stream, at least in part---certainly it's clear that the "is" is italicized"...That seems like enough information to build reveal codes out of... [1] https://github.com/paultopia/dedocx/blob/master/deconstruct.... |
|