I've heard the same has happened with common benchmarks (they've ingested solutions into training data)