Hacker News new | ask | show | jobs
by littlekey 291 days ago
Doesn't that mean having to go back to manually labeling examples? That can be a big hurdle compared to just zero-few shotting some stuff into the LLM prompt. Unless there's something I'm misunderstanding about your approach. Or maybe it's possible to do an unsupervised clustering step on the vectors to get the labeled categories that you can then pass to the supervised classification model. Though I guess that would depend on how strictly defined the target categories are for the use case in question.
1 comments

To some degree manual labeling has to be done anyway, just to validate that any approach works at all, you'll always need ground truth from somewhere. What I suggested is that zero/few-shotting might not be good enough, depending on the problem. Labeling ~1000 samples isn't too bad, I've done it by hand a few times now. If you can source a high quality positive signal from somewhere (e.g. user-behavioral data), even better.