Hacker News new | ask | show | jobs
by _lpa_ 3339 days ago
I did this a while ago (www.hnreads.com, bookbot.io). Manually labelled ~2k comments, trained a NER system, then validated the titles via amazons api. Was pretty easy once I had manually labelled comments. Don't recall my F1/precision/recall scores - they were ok but lower than the state of the art reported in papers.
1 comments

That's interesting. Which labels were you assigning to them?
I had a macro in emacs that wrapped the highlighted text in some xml tag (say <book></book>). Processing that I could label it however - e.g. IOB or whatever you fancy. The labelling didn't really take that long to do, maybe a few hours over a couple of days.