Hacker News new | ask | show | jobs
by Imnimo 1971 days ago
Sure, but you need to demonstrate that the auto-labeled training data is valuable by showing that a model trained on it performs as well (or close to as well) as the same model trained on human-labeled data. Without that, we're just eyeballing the auto-labels and saying "looks good I guess!"

Obviously we should expect that the auto-labeler fails on the test set, because we assume we're exploiting some convenience that won't be available at test time. But we should still try - it might reveal that our task is too easy to need the model we were planning to train, or it might reveal that our test set is not actually representative.

1 comments

Yea, so that's more of a comment on the accuracy of the auto generated labels, because this approach doesn't assume a different representative set of data than with human labelled data, just that less of the data is human labelled.

So it comes down to how good the auto generated labels are(from a human perspective), which is a fair point that I didn't address much in the article, but in general comes down to a good QA process(which is applied to both human labels and machine labels equally because humans also make mistakes in this stuff).

In the article the dataset was small enough and the labels simple enough that I could run very quick visual inspection over the results, but for more complicated tasks we have a more rigorous human review process for evaluating label accuracy(again to both human and algorithm produced labels). The auto generated labels may not be more efficient overall if they require a lot of correction after review, but for this case, and a lot of other ones, they just are empirically are.