Y
Hacker News
new
|
ask
|
show
|
jobs
by
andrepd
591 days ago
Of course lol. How come e.g. o1 scores so high on these reasoning and math and IMO benchmarks and then fails every simple question I ask of it? The answer is training on the test set.