|
|
|
|
|
by lwhsiao
2305 days ago
|
|
One of the co-authors of Fonduer here. Just for reference the original paper for Fonduer is here: https://dl.acm.org/doi/pdf/10.1145/3183713.3183729 And additional follow-up work on extracting data from PDF datasheets is here: https://dl.acm.org/doi/pdf/10.1145/3316482.3326344 One thing to point out about our library is that while we do take PDF as input and use it to calculate visual features, we also rely on an HTML representation of the PDF for structural cues. In our pipeline this is typically done by using Adobe Acrobat to generate an HTML representation for each input PDF. |
|