|
|
|
|
|
by idle_zealot
721 days ago
|
|
> That is very different from these models that are just rewarded for mimicking regardless if it is right or wrong That's not a totally accurate characterization. The base models are just trained to predict plausible text, but then the models are fine-tuned on instruct or chat training data that encourages a certain "attitude" and correctness. It's far from perfect, but an attempt is certainly made to train them to be right. |
|