|
|
|
|
|
by mtlmtlmtlmtl
856 days ago
|
|
From the paper: If Stockfish detects a mate-in-k (e.g., 3 or 5) it outputs k and not a centipawn score. We map all such outputs to the maximal value bin (i.e., a win percentage of 100%). Similarly, in a very strong position, several actions may end up in the maximum value bin. Thus, across time-steps this can lead to our agent playing somewhat randomly, rather than committing to one plan that finishes the game quickly (the agent has no knowledge of its past moves). This creates the paradoxical situation that our bot, despite being in a position of overwhelming win percentage, fails to take the (virtually) guaranteed win and might draw or even end up losing since small chances of a mistake accumulate with longer games (see Figure 4). To prevent some of these situations, we check whether the predicted scores for all top five moves lie above a win percentage of 99% and double-check this condition with Stockfish, and if so, use Stockfish’s top move (out of these) to have consistency in strategy across time-steps. So they freely admit that their thing will draw or even lose in these positions. It's not merely making the win a little cleaner. |
|
Yeah, they didn't use Stockfish for the lols.
They create a search-less engine for chess. And then used a search engine to pay a small minority of the game.