There are many examples where the test is slightly OOD (out of distribution), so the solver will have to generalize.