|
|
|
|
|
by mpeg
880 days ago
|
|
Off-topic, but do you know how Tika compares to other pdf parsing libraries? I was very unimpressed by pdfminer.six (what unstructured uses) as the layout detection seems pretty basic, it fails to parse multi column text, whereas MuPDF does it perfectly Currently I'm using a mix of MuPDF + AWS Textract (for tables, mostly) but I'd love to understand what other people are doing |
|