|
|
|
|
|
by ben_w
211 days ago
|
|
Slight quibble, but the reinforcement learning from human feedback means they're trained (somewhat) on what the specific human asking the question is likely to consider right or wrong. This is both why they're sycophantic, and also why they're better than just median internet comments. But this is only a slight quibble, because what you say is also somewhat true, and why they have such a hard time saying "I don't know". |
|