| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by elandau25 2018 days ago

I see it a bit differently. I see it as two separate(but correlated) tasks. There is labelling the data and building a robust model. There is a nuanced gap between the two. The labelling task and the model task live a different constraint space.

When you are labelling data, you have access to strategies and means that might not be available to your downstream model. In our experience this includes a human in the loop component, building non-robust ensemble models(we call these micro-models), and some "guess work" functions on the data. All of this together can make an "auto labeller" that does pretty well getting labels made, but really the sum of these strategies is very different from some well trained neural network that will be running on edge or whatever.

The point of a model is not to label the data, it's to generate some value in some out of sample task, quite different from strategies that you can run in a sandboxed environment with your training data.

1 comments

Imnimo 2018 days ago

Sure, but you need to demonstrate that the auto-labeled training data is valuable by showing that a model trained on it performs as well (or close to as well) as the same model trained on human-labeled data. Without that, we're just eyeballing the auto-labels and saying "looks good I guess!"

Obviously we should expect that the auto-labeler fails on the test set, because we assume we're exploiting some convenience that won't be available at test time. But we should still try - it might reveal that our task is too easy to need the model we were planning to train, or it might reveal that our test set is not actually representative.

elandau25 2018 days ago

Yea, so that's more of a comment on the accuracy of the auto generated labels, because this approach doesn't assume a different representative set of data than with human labelled data, just that less of the data is human labelled.

So it comes down to how good the auto generated labels are(from a human perspective), which is a fair point that I didn't address much in the article, but in general comes down to a good QA process(which is applied to both human labels and machine labels equally because humans also make mistakes in this stuff).

In the article the dataset was small enough and the labels simple enough that I could run very quick visual inspection over the results, but for more complicated tasks we have a more rigorous human review process for evaluating label accuracy(again to both human and algorithm produced labels). The auto generated labels may not be more efficient overall if they require a lot of correction after review, but for this case, and a lot of other ones, they just are empirically are.