Hacker News new | ask | show | jobs
by voz_ 1100 days ago
If an llm labels it, does that have the same value? Isn’t it just fancy regurgitation of knowns?
4 comments

Even humans disagree about labels. Especially humans willing to do this work.

And with the topical depth say ChatGPT4 has, I would think these labels have more value, although just as with humans some validation and verification steps are required.

Good question - one followup question there is value for who? If it is to train the LLM that is labeling, then I agree. If it is to train a smaller downstream model (e.g. finetune a pretrained BERT model) then the value is as good as coming from any human annotator and only a function of label quality
Why retrain that smaller model from scratch tho? Just do a little transfer learning, or get creative and see if you can prune down to a smaller model algorithmically instead of doing the whole label and train rigamarole from scratch on what is effectively regurgitation.

I’m not sold this has directional value.

Hmm, I'm not suggesting training a smaller model from scratch - in most cases you'd want to finetune a pretrained model (aka, transfer learning) for your specific usecase/problem domain.

The need for labeled data for any kind of training is a constant though :)

It has some value. If you let AI label the data and feed it back to it you are reaffirming it's guesses. If you independently verified that guesses are as correct as human ones you are teaching AI to be more sure about the correct thing.
llm have emergent abilities [0] which could provide additional value to any output or label.

[0] https://www.jasonwei.net/blog/emergence

Not sold they do.