Hacker News new | ask | show | jobs
by andix 505 days ago
This test is worthless in a few weeks, it's now going into the training data. Even repeatedly posting it into LLM services (with analytics enabled) could lead to inclusion in the training data.
1 comments

Interestingly, this test has been in the public domain for the last seven years, since it is part of all possible chess games with 7 or less pieces, which is solved and published. It is a huge file, but the five pieces games dataset with the FEN is less than a GB. I wonder if it even got included in the training data earlier, or if it will be.
I don't think such datasets are going into AI training. But if this exact question keeps showing up in analytics data, and forum posts, it might end up in training sets.