Hacker News new | ask | show | jobs
by nl 3021 days ago
There's no such thing as auto-labeling.

Data Turks is manual labeling.

There is active learning[1] and related algorithms where you trace the boundary of your classifier and pass examples along that boundary to be manually labeled (as they are the ones the classifier is most unsure about).

But there is nothing "auto" about this - it's just being smart about where to deploy the manual labor.

[1] https://en.wikipedia.org/wiki/Active_learning_(machine_learn...

1 comments

Lets say we want to create a labeled data for text summarization for medium articles. Could the highlighted part be used as summary, its not auto labeled per se, but can be a proxy and passed to labelers to verify/edit.
Sure. There are lots of useful proxies for labeled data.

It's worth noting that highlighted sections in Medium articles probably aren't great summaries (they are more a representation of important points - which is a useful thing to predict as well).

For example, many summarizer systems are trained on the single-line summaries given in news media systems. There have been attempts to use Tweets as summaries for linked articles too.