|
|
|
|
|
by elandau25
1971 days ago
|
|
Hi Imnimo, I wrote the article and definitely understand your concerns. The point is not the specific steps I took working in general for most datasets, but more the overall idea of using a more data science-y approach to labelling rather than just blindly throwing your data at a workforce. A more varied dataset will require additional strategies. We have done this type of thing with various datasets and what normally works is a combination of some vertical models, heuristics specific to the dataset, classical computer vision techniques, and some human label seeding/correction. |
|
-The outputs of the auto-labeler. If this is strong, you've learned that you didn't need the training set after all - you managed to solve the problem without it!
-The outputs of a model trained on auto-labeled data. If this is strong but the above test was not, then this pipeline makes sense.
-The outputs of a model trained on human-labeled data. If this is strong but the above tests were not, we're in trouble.
If none of the three are strong, then the training data was lacking (assuming we've done our best on tuning the model we're trying to train), and so no real value was gained by annotating it.