Hacker News new | ask | show | jobs
by tough 415 days ago
what do you mean exactly? I was suprised how with grobid many of at least the arXiv papers are easily converted to xml for better processing than PDF.

Most of the papers are constructed from their latex sources so there's an easy way to undo it i guess.

https://github.com/kermitt2/grobid

2 comments

grobid is a wonderful resource, patrice did an awesome job (I used it at my previous job at scite.ai)
that's exactly what I needed!
glad to hear!