|
|
|
|
|
by nwienert
336 days ago
|
|
I’ve seen some of the problems before, like https://o3-failed-arc-agi.vercel.app/ This is not hard to build datasets that have these types of problems in them, and I would expect LLMs to generalize this well. I don’t see how this is any different really than any other type of problem LLMs are good at given they have the dataset to study. I get they keep the test updated with secret problems, but I don’t see how companies can’t game this just by investing in building their own datasets, even if it means paying teams of smart people to generate them. |
|