|
|
|
|
|
by Kiro
583 days ago
|
|
> OpenAI's flagship models are not even correct 50% of the time[1] You're reading the link wrong. They specifically picked questions that one or more models failed at. It's not representative of how often the model is wrong in general. From the paper: > At least one of the four completions must be incorrect
for the trainer to continue with that question; otherwise, the trainer was instructed to create
a new question. |
|