Hacker News new | ask | show | jobs
by traject_ 727 days ago
We don't actually know if it is SOTA, the previous SOTA solution also got around the same on the evaluation set.
1 comments

Yeah and GPT4o was potentially trained on this test set and if the tried to hold it out it was still likely trained on discussions of the problems.