|
|
|
|
|
by BoorishBears
987 days ago
|
|
Reinforcement learning from human feeedback is the training you're referring to, you just don't realize it. RLHF is why 2 years ago "They're amazing for what they are" would have been "They're so hideous no one in their right mind would use them", and why in 2 years that too will be some weaker form of argument. There's no special knowledge needed to know "I like X over Y": RLHF allows a model to turn that into guidance at a scale that's never been possible before. |
|