Hacker News new | ask | show | jobs
by elandau25 1971 days ago
Yea, so that's more of a comment on the accuracy of the auto generated labels, because this approach doesn't assume a different representative set of data than with human labelled data, just that less of the data is human labelled.

So it comes down to how good the auto generated labels are(from a human perspective), which is a fair point that I didn't address much in the article, but in general comes down to a good QA process(which is applied to both human labels and machine labels equally because humans also make mistakes in this stuff).

In the article the dataset was small enough and the labels simple enough that I could run very quick visual inspection over the results, but for more complicated tasks we have a more rigorous human review process for evaluating label accuracy(again to both human and algorithm produced labels). The auto generated labels may not be more efficient overall if they require a lot of correction after review, but for this case, and a lot of other ones, they just are empirically are.