|
|
|
|
|
by nl
745 days ago
|
|
The "Using python to dump the PDF to text" dramatically underestimates how hard this is. Tables and especially multi-column PDFs often need one-off handling and - worse - you don't know when one is being misparsed until you start getting weird search results. At that point you need to debug your entire search pipeline, which isn't fun! |
|