|
|
|
|
|
by crackrook
539 days ago
|
|
No individual Kaggle solution achieved a result of 81%, rather an ensemble of models: https://x.com/fchollet/status/1865865271728390515 In my (possibly flawed) interpretation: o3's scores appear to be an achievement because they were attained by a single model, but the benchmark itself needs refinement before it can claim to be a measure of AGI like it set out to be, as one can bruteforce their way to similar results. |
|