|
|
|
|
|
by CamperBob2
336 days ago
|
|
They could still have trained the model in such a way as to focus on benchmarks, e.g. training on more examples of ARC style questions That's kind of the idea behind ARC-AGI. Training on available ARC benchmarks does not generalize. Unless it does... in which case, mission accomplished. |
|