Hacker News new | ask | show | jobs
by doe_eyes 721 days ago
> LLMs aren't trained for accuracy

This assertion in the article doesn't seem right at all. When LLMs weren't trained for accuracy, we had "random story generators" like GPT-2 or GPT-3. The whole breakthrough with RLHF was that we started training them for accuracy - or the appearance of it, as rated by human reviewers.

This step both made the models a lot more useful and willing to stick to instructions, and also a lot better at... well, sounding authoritative when they shouldn't.

1 comments

Isn't that the issue? Getting thumbs up from an underpaid human reviewer isn't the same as accurate facts.
The one person I know getting paid to review AI outputs gets paid anywhere from $25 / hour to $40 / hour. Not sure if that's underpaid. It may be a nice option when you can do it at any time to supplement your regular income.
Reviewing AI output or helping in training a LLM itself?
This person works through an interface which is similar to Mechanical Turk. You get a list of available projects you qualified for via an assessment. For the AI projects, many of them are comparing responses from two different models, answering questions, and selecting the best response. Other projects might be attempting to get the model to do something against the guidelines, or rating the model on certain capabilities. There's no requirements other than to pass the assessment. As with Mechanical Turk, you can work on your available projects at any time.

This feedback is used for training.

It's not the same as completely accurate facts, but it's much closer to accurate facts than LLMs we had before.