| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by hadsed 1830 days ago

Beautiful. So many annotation tools focus on "text classification" which assumes you've already got segmented samples. In the real world of documents that's a whole challenge in itself.

Another challenge is that sometimes you're working with PDFs and that means not only ingesting but also displaying. The difficulty is in keeping track of annotations and predictions across the PDF<->text string boundary, both ways.

There are understandably even fewer solutions to that problem because it's a harder UI to build.

2 comments

gryn 1830 days ago

allenai seems to be working on something like that for pdf files.

https://github.com/allenai/pawls

link

neiman1 1830 days ago

Much appreciated! That's true, and lots of the tools that do feature text annotation can be quite restrictive in that they don't allow you to add attributes / repeatedly annotate the same span of text.

Support for PDFs and other doc types is definitely on the backlog, but I keep holding off due to the challenges you mentioned.

link