Hacker News new | ask | show | jobs
by hadsed 1830 days ago
Beautiful. So many annotation tools focus on "text classification" which assumes you've already got segmented samples. In the real world of documents that's a whole challenge in itself.

Another challenge is that sometimes you're working with PDFs and that means not only ingesting but also displaying. The difficulty is in keeping track of annotations and predictions across the PDF<->text string boundary, both ways.

There are understandably even fewer solutions to that problem because it's a harder UI to build.

2 comments

allenai seems to be working on something like that for pdf files.

https://github.com/allenai/pawls

Much appreciated! That's true, and lots of the tools that do feature text annotation can be quite restrictive in that they don't allow you to add attributes / repeatedly annotate the same span of text.

Support for PDFs and other doc types is definitely on the backlog, but I keep holding off due to the challenges you mentioned.