Hacker News new | ask | show | jobs
by anon84873628 3 hours ago
Right. Everyone is using this to judge the LLMs instead of questioning what situation they were actually fed and whether it was in fact the best move.

More likely, the simulation was just very poor and the results are nonsense.