Hacker News new | ask | show | jobs
by cma 736 days ago
Yeah and GPT4o was potentially trained on this test set and if the tried to hold it out it was still likely trained on discussions of the problems.