|
|
|
|
|
by pclmulqdq
1070 days ago
|
|
Given all of the times OpenAI has trained on peoples' examples of "bad" prompts, I am sure they are fine-tuning on these benchmarks. It's the natural thing to do if you are trying to position yourself as the "most accurate" AI. |
|
If it performs about as well in instances it has never seen before (test set) then it's not overfit to the test.