|
|
|
|
|
by lwhsiao
2300 days ago
|
|
There are a handful. We looks at bounding boxes to featurize which spans are visually aligned with other spans. Which page a span is on, etc. You can see more in the code at [1]. In general, visual features seem to give some nice redundancy to some of the structural features of HTML, which helps when dealing with an input as noisy as PDF. [1]: https://github.com/HazyResearch/fonduer/tree/master/src/fond... |
|