|
|
|
|
|
by criemen
5 days ago
|
|
Partially, 2.2 Submission workflow W2 deals with this: > Stage W2 The five project-active models, see Table 2, attempted the question. Their answers were compared to
the original answer by an LLM judge. If at most three models answered correctly, the contributor could
proceed. So "trivially contained in the training data" is excluded, as then all models could/should easily come up with the solution. |
|