Hacker News new | ask | show | jobs
by dazzaji 954 days ago
Does anybody know if 2008-2009 SAT is in the training set for these models? Assuming so, I’d be especially interested in head-to-head evals on this type of non-code benchmark for problem sets not already in the training data, to see how it performs on fresh situations.