Hacker News new | ask | show | jobs
by andrepd 591 days ago
Of course lol. How come e.g. o1 scores so high on these reasoning and math and IMO benchmarks and then fails every simple question I ask of it? The answer is training on the test set.