|
|
|
|
|
by ramoz
878 days ago
|
|
For me, PyMuPDF/fitz has been the best way to retain natural reading order and set dynamic enough rules to extract text in complex layouts. None of the mentioned tools did this out of the box, none seemed easy to configured, all definitely hyped and marketed way beyond fitz though. |
|
The only thing it doesn't do is tables detection (neither does pdfminer.six), but there are plenty of other ways to handle them.