| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by codeviking 1778 days ago

Yup, right now we use GROBID, do some post processing and combine the output with other extraction techniques. For instance, we use a model to extract document figures[1], so that we can render them in the resulting HTML document.

Also, we're working hard on a new extraction mechanism that should allow us to replace GROBID [2].

There's a lot of really smart people at AI2 working on this, I'm excited to see the resulting improvements and the cool things (like this) that we build with the results!

[1]: https://api.semanticscholar.org/CorpusID:4698432

[2]: https://api.semanticscholar.org/CorpusID:235265639