|
|
|
|
|
by saeedesmaili
1072 days ago
|
|
I tried python-docx with a bunch of docx files (downloaded from Google Docs). It returns empty strings for hyperlinks and I couldn't manage to fix this. So if there is a sentence like "This is an important link to another doc or url." and the "link" is a hyperlink, python-docx returns "This is an important to another doc or url." |
|
I realize now staring at this, that I might have broken API a little. You can't do "text = paragraph.text" anymore, but you can do "text = ''.join([run.text for run in paragraph.runs])" instead.
If you're curious at all why it breaks, it's because in the OOXML spec paragraphs are made up of a ordered list of runs or hyperlinks (and hyperlinks can then contain additional runs). The master branch just implements paragraphs as ordered list of runs (and ignores all hyperlinks).