|
|
|
|
|
by menaerus
516 days ago
|
|
Thanks for the link. A holdout set which is yet to be used to verify the 25% claim. He also says that he doesn't believe that OpenAI would self-sabotage themselves by tricking the internal benchmarking performance since this will get easily exposed, either by the results from a holdout set or by the public repeating the benchmarks themselves. Seems reasonable to me. |
|
The public has no access to this benchmark.
In fact, everyone thought it was all locked up in a vault at Epoch AI HQ, but looks like Sam Altman has a copy on his bedside table.