|
|
|
|
|
by freethejazz
673 days ago
|
|
Depending on how much structure you want to extract before passing the pdf contents to the next step in your pipeline, this paper[1] might be helpful in surfacing more options. It's a review/benchmark of numerous tools applied to the information extraction of academic documents. I haven't been through to evaluate the solutions they examined, but it's how I discovered GROBID and IMO lays out the strengths of each approach clearly. [1] https://arxiv.org/pdf/2303.09957 |
|