Hacker News new | ask | show | jobs
by sceptic123 421 days ago
I don't know, but it's not a hard test, get the LLM to play a perfect game of tic-tac-toe against itself, look at the output and see if it goes wrong.