Hacker News new | ask | show | jobs
by BoorishBears 987 days ago
Reinforcement learning from human feeedback is the training you're referring to, you just don't realize it.

RLHF is why 2 years ago "They're amazing for what they are" would have been "They're so hideous no one in their right mind would use them", and why in 2 years that too will be some weaker form of argument.

There's no special knowledge needed to know "I like X over Y": RLHF allows a model to turn that into guidance at a scale that's never been possible before.