|
|
|
|
|
by dazzaji
954 days ago
|
|
Does anybody know if 2008-2009 SAT is in the training set for these models? Assuming so, I’d be especially interested in head-to-head evals on this type of non-code benchmark for problem sets not already in the training data, to see how it performs on fresh situations. |
|