|
|
|
|
|
by convivialdingo
1072 days ago
|
|
I've had good luck with python-docx for reading word documents (typically specifications). Tables are supported - but it's not obvious where the table comes from in the document and I had to come up with a hack way to read image captions. PDF has been hit or miss, but pypdf has improved in the last couple of years. Depending on the document you'll sometimes get random spaces or nospacesatall. |
|