|
|
|
|
|
by lolinder
511 days ago
|
|
A co-founder of Epoch left a note in the comments: > We acknowledge that OpenAI does have access to a large fraction of FrontierMath problems and solutions, with the exception of a unseen-by-OpenAI hold-out set that enables us to independently verify model capabilities. However, we have a verbal agreement that these materials will not be used in model training. Ouch. A verbal agreement. As the saying goes, those aren't worth the paper they're written on, and that's doubly true when you're dealing with someone with a reputation like Altman's. And aside from the obvious flaw in it being a verbal agreement, there are many ways in which OpenAI could technically comply with this agreement while still gaining a massive unfair advantage on the benchmarks to the point of rendering them meaningless. For just one example, knowing the benchmark questions can help you select training data that is tailored to excelling at the benchmarks without technically including the actual question in the training data. |
|
It seems to me that o3's 25% benchmark score is 100% data contamination.