It would be nice to have a chess trainer AI that considers the human factor when evaluating a position.
It's funny that it is relatively easy to beat stockfish when the computer has to play without the queen.
But it is quite hard to beat a pro player even with such a strong handicap.
Still, the pro player has absolutely no chance against the engine without an handicap.
Assuming that stockfish runs on a computer that is much faster than what we have today and sees that white can always forcibly win, I wonder if stockfish would immediately resign as black playing against a human, even before the very first move.
maiachess.com is "A human-like neural network chess engine" which is trained at 9 skill levels (1100-1900).
>We tested each Maia on 9 sets of 500,000 positions that arose in real human games, one for each rating level between 1100 and 1900. Every Maia made a prediction for every position, and we measured its resulting move-matching accuracy on each set.
>Each Maia captures human style at its targeted skill level. Lower Maias best predict moves played by lower-rated players, whereas higher Maias predict moves made by higher-rated players.
In sufficiently advanced CPU vs. CPU, the Black side would definitely resign on the first move.
However, assuming White has a 10% higher chance of winning in games between 2 Human players, that still leaves a decent margin of error for the White Human side to blunder during the game, so Black CPU wouldn't resign. This is assuming CPU doesn't know about that individual Human's blunder history.
Basically, CPUvCPU would definitely see an instant resign. In HUMvCPU, only the Human should definitely surrender as black. A Black CPU will keep playing in case the Human blunders.
> But it is quite hard to beat a pro player even with such a strong handicap.
Don't take this the wrong way, but how good are you, i.e. what is your rating?
imo, any reasonably seasoned player, after handicapping their opponent to be without a queen, should be able to easily win in a relatively straightforward by avoiding blunders and trading off pieces.
You underestimate the amount of traps a strong player could have by making the position complex. Hikaru could beat >2000 rated player in queen odds[1].
> In order to compare lines of different lengths, we take the geometric mean of the probabilities, to give the average probability of the opponent playing the next required move in sequence.
This doesn't seem right to me. A line where your opponent has to find ten 75% moves in a row to fall into it is less "probable", by any reasonable understanding of the word, than one where he has a 50% chance of going wrong immediately.
I'd multiply the probabilities move by move to get the cumulative probability, but I'd only start at the point where the trap-setter plays a suboptimal move, to account for these different lengths of lines.
The longer lines are technically less probable, like you say - I was trying to capture the notion that a trap is more impressive if on average, moves are likely. How would you define 'sub-optimal' move in your proposal? It's a good idea -would be interesting to see it in action!
I can't think of a good way to do this purely based on move stats: I think you'd want to involve the engine. There's a Stockfish package for Python you could use, and it wouldn't be expensive or difficult to query the few evaluations you'd need. I would say a suboptimal move is when the evaluation changes by more than X, maybe 50 or 80 centipawns.
I tried out the project and sent you a couple of minor pull requests. I wanted to score my pet trap:
1. e4 c6 2. Nf3 d5 3. d3 dxe4 4. Ng5 exd3 5. Bxd3
after which the most popular moves Nf6 (trap score 34%) and h6 (trap score 29%) both lose immediately to Nxf7 (if Kxf7, Bg6+ and Qxd8 wins the queen). I've seen plenty of titled players fall for this. I think the correct way to score this trap is the sum of those trap scores, because the "trap" is set after Bd3, but there are multiple ways to fall into it. That would give it by far the highest score at 63%, but maybe changing the methodology this way would also increase the score of your other traps.
It might also be interesting to calculate the win probabilities for the players when the trap is avoided, so you can judge what you pay for setting the trap.
The Stafford Gambit doesn't fare well there. It requires Black to play 3. ... Nc6 in the Russian defense after
1. e4 e5 2. Nf3 Nf6 3. Nxe5
which according to Stockfish loses more than a pawn's worth
(eval +2 for white after Nxc6 d3 Bc5 h3).
I absolutely would have guessed Stafford as the top with Englund near so the resulting ranking seems to correspond strongly with my intuition at least. I'm surprised Blackmar-Diemer is so high but maybe that becomes more viscious when you're higher rated than I am.
That's good to hear! Blackmar-Diemer is favoured by this ranking system as although it's a longish line, the opponent's move are all fairly probable, with the least probable being Qxd4 at still around 20%.
I have tried the Stafford gambit against real people and i have found that unless people fall for the trap it is a very weak play for black. I keep having to make dumb moves hoping for the other player to fall into the trap.
I imagine the answer here varies a lot by elo of the training corpus because skill effects the probability input. In the 500s, 3. Qh5 to scholar's mate is both potent and very probable. In the 1300s it's just a mistake. It would be really interesting to see the overlap between high and low elo traps.
It's funny that it is relatively easy to beat stockfish when the computer has to play without the queen. But it is quite hard to beat a pro player even with such a strong handicap.
Still, the pro player has absolutely no chance against the engine without an handicap.
Assuming that stockfish runs on a computer that is much faster than what we have today and sees that white can always forcibly win, I wonder if stockfish would immediately resign as black playing against a human, even before the very first move.