Ha, I gave Bing Chat something from Advent of Code and it correctly identified that it came from Advent of Code (without that being anywhere in the prompt). It provided a solution, but given that it identified the source of the question I don't think it was a good test. As you say, maybe changing some values will help.
Give it Synacor Challenge, just the spec, and see if it can pull it off. Fewer of those solutions out there. It just went offline recently, but Aneurysm9 has preserved the problem spec, their binary, and the checksum of the codes for their binary to check against.
The program will, likely, need to be amended to get the last code (last 2 codes? been a while) so you can see how it would handle updating for the new requirements.