|
|
|
|
|
by dwohnitmok
121 days ago
|
|
Not anymore. This benchmark is for LLM chess ability: https://github.com/lightnesscaster/Chess-LLM-Benchmark?tab=r.... LLMs are graded according to FIDE rules so e.g. two illegal moves in a game leads to an immediate loss. This benchmark doesn't have the latest models from the last two months, but Gemini 3 (with no tools) is already at 1750 - 1800 FIDE, which is approximately probably around 1900 - 2000 USCF (about USCF expert level). This is enough to beat almost everyone at your local chess club. |
|
Additionally, how do we know the model isn’t benchmaxxed to eliminate illegal moves.
For example, here is the list of games by Gemini-3-pro-preview. In 44 games it preformed 3 illegal moves (if I counted correctly) but won 5 because opponent forfeits due to illegal moves.
https://chessbenchllm.onrender.com/games?page=5&model=gemini...
I suspect the ratings here may be significantly inflated due to a flaw in the methodology.
EDIT: I want to suggest a better methodology here (I am not gonna do it; I really really really don’t care about this technology). Have the LLMs play rated engines and rated humans, the first illegal move forfeits the game (same rules apply to humans).