Hacker News new | ask | show | jobs
by rambambram 783 days ago
> It is a giant pain in the ass but you have to spend the time sitting in front of the screen going through the data and removing things and tagging things and making sure that the details are right. This is really what makes the good models good and the rest mediocre.

In some other comment I read this. Sounds very much like a curation thing. And now I'm wondering; isn't this part already covered by a lot of human beings now interacting with ChatGPT and the like?

My uneducated guess is that a company can scrape the whole world wide web and also have all the low quality content that comes with it, but then strengthen/curate their data and/or model by having it interact with humans? You give this thing a prompt, it comes up with some obvious nonsense, and then you as a human correct this by 'chatting' with it?

3 comments

People typically ask LLMs about things they DON‘T know about or understand. So they are not qualified to assess the validity of their answers. Which is exactly why hallucination is such a big problem.
> People typically ask LLMs about things they DON‘T know about or understand. So they are not qualified to assess the validity of their answers.

Eh, you can still often (!) figure out whether what the LLM says makes sense.

Just like you can often figure out whether a human is bullshitting, by fact checking with other sources, or going over their reasoning.

"Fixing" low quality data with RLHF is a waste of time. By that point it's already poisoned the model distribution, and all you're doing is steering it away from catastrophic failure cases.

Start with the best data you can, and task train ("rlhf") behavior not preference.

Yeah when you use OpenAI you are giving them free labor for data curation.