|
|
|
|
|
by jononor
247 days ago
|
|
I think that the best way to address this potential ARC overfitting, would be to create more benchmarks - that are similar in concept, focusing on fluid intelligence, but from another angle than ARC. Of course it is quite costly and also requires some "marketing" to actually get it established. |
|
If it is to test generalisation capability, then what data the model being evaluated is trained on is crucial to making any conclusions.
Look at the construction of this synthetic dataset for example: https://arxiv.org/pdf/1711.00350