Hacker News new | ask | show | jobs
by berkut 1348 days ago
Does engine correlation actually prove anything though? Some of the 'statistical analysis' that has been posted on twitter regarding it in the last week has been against hundreds of engines, so 'engine correlation' seems to mean "the move made matched against at least one engine that would have made that move" I think?
6 comments

1) You can look at the "strength" of individual moves. Someone who plays at 2000-level normally but magically coughs up 2600-level moves when in trouble is probably cheating (watch some of the live chess streamers--you'll regularly see this in real time). Computers are quite good at estimating the strength of a move after the fact.

2) Quite often there are certain "play lines" that computers will play that humans simply can't find over the board.

For example, a computer can take a defensive "play line" that is littered with traps with only a single non-losing path for 30+ moves and work it out really quickly (there is only one non-losing path to take so it prunes the search space mega fast) and play it perfectly. A human playing such a line is almost always cheating--humans simply can't run those kinds of lines in real time.

If you look at computers analyzing even the highest end games, you see the humans making quite a few mistakes that the computers will spot and take advantage of immediately. Someone who walks down these kinds of paths regularly is a statistical anomaly.

That having been said, given the current crop of computer-trained chess kids, it IS possible that we'll grow a prodigy that can run those kinds of lines. However, it doesn't seem like that person exists, yet.

I'm haunted by the possibility that humans might (at least half) catch up, too. When I look at how AI beats humans, I can't help thinking that AI shows us that human narcissism holds humans back. We don't want to look stupid, or mediocre - we don't want to make moves that are hard to explain the value of clearly.

In Go, we can't make ourselves spread our moves around the board as much as we should, we tend not to choose a maybe good move elsewhere over a clearly powerful move where the board is developed, for example.

Maybe there's a pattern to the moves AI chooses that is also a pattern humans can see without running every line; we're just reluctant to choose moves that we can't clearly justify in the shorter run.

> Computers are quite good at estimating the strength of a move after the fact.

Are you sure? I’ve never heard of such a program.

He's just referring to the fact that after game is over you can let your program stew on any given move for a weekend or more before reporting back on how strong it was or wasn't.
But standard programs like Stockfish won’t tell you how strong a move was. They’ll just tell you how much it changes the evaluation.

E.g. if you initiate a queen trade in a straightforward position, on the next move I have to take back my queen; any other move will show a gigantic evaluation drop by the engine. But that doesn’t mean it’s a particular strong move — even an absolute beginner will play it. Thus it’s of no value for determining whether the person who played it is cheating.

It’s entirely possible that chess.com has access to more sophisticated software that can estimate the strength of players (they sort of allude to this with their “strength score” metric) but AFAIK it’s not publicly available and not clear how it works, or whether it can evaluate individual moves as opposed to the game as a whole.

I think it's more a question of whether, given more time to calculate, the software changes it's choice of move to something else.
Am not a statistician, but at least in an online analysis I saw, seems like correlation can effectively identify players who are playing too much like a computer. Because they don't just run correlations on Niemann, but on all the top players, and do comparisons (and for certain long stretches of tournaments, Niemann's is playing way, way above how anyone else has ever played).

This is video explains it pretty well, and seems like a very compelling argument (at least to me): https://www.youtube.com/watch?v=qjtbXxA8Fcc - and just know that the woman talking is a bit hard to understand because of her accent.

... oh, and to address your point about 100's of engines, my first thought was that are only a handful that everyone uses (Stockfish?) (and also, just guessing, but I get the impression that most top engines recommend similar moves, but again, just a guess!).

Ah, have to admit, this is a very good counter argument, calling chessbase's methods in to question. And it's surprising that chessbase does not always do the same analysis for each game, that its nodes aren't setup with the same set of chess engines (although, again, maybe most top chess engines suggest similar moves??). Hmm...

... also, not sure I agree with his opinion that for each move that is analyzed, LetsCheck will return 100% if any engine returns 100% (and there could be multiple computers that were used, each with different engines). The point of the analysis is to determine if a player is playing like a computer, and the user may himself have multiple chess engines open in order to confuse the cheat detection. But again, am not an expert at chess engines or statistics, so am not sure what effects checking multiple engines has...

... also, he says that "Ken Regan's scientifically valid method has exonerated Hans by saying his results do not show any statistically valid evidence of cheating." This is very confusing, because Chess.com post has basically said the opposite (maybe Ken Regan's analysis is referring to a different subset of games?). Guess nothing is definitive. But at this point, I still lean towards Hans cheating (on top of this analysis, there is also a lot of circumstantial things he did that to me indicate he might have cheated, which is too long a topic to go into).

Chess.com explicitly states in the report that that sort of methodology does “not meet our standard” for cheating detection. If they don’t feel comfortable using it, I certainly don’t.
> Does engine correlation actually prove anything though?

It probably doesn't, and for many reasons, both because the more engines you add the greater the chance of falsely accusing someone (so an analysis that features hundreds engines is probably worthless), but worse than that, you can manipulate the result of the analysis through the selection of engines

There's a number of topics about that on /r/chess, like https://www.reddit.com/r/chess/comments/xtwzfe/fm_ingvar_joh... https://www.reddit.com/r/chess/comments/xtwzfe/fm_ingvar_joh... etc

But anyway this isn't the analysis that Chess.com does anyway

I don't think engine correlation necessarily proves anything, on its own. It's worth remembering, though, that chess.com's report a) presents more than merely raw engine correlation, and b) its correlations do not seem to match against hundreds of engines.

But even with all the evidence presented, "proof" is a tricky thing. To what standard would we be trying to prove a claim?

Does this report prove beyond all reasonable doubt that Niemann cheated? I'd say no, but others may disagree.

How about to a preponderance of evidence? Perhaps. But even that is hard to say when no one has yet presented a rigorous defense or set of counterpoints.

In any case, my post wasn't meant to say that Niemann cheated per se. I have no idea, and chess.com themselves may not be able to actually prove whether he did. But I found the report interesting, even beyond the current issue surrounding Niemann and speaking to potential cheating in high-level chess more broadly, and if you re-read my post, I tried not to state anything definitive about whether Hans actually cheated or not.

Not anymore than elevated testosterone levels are "proof" of performance enhancing drugs. Engine correlation is a marker and when combined with other markers, can be meaningful.

What it mostly shows is that Hans move strength is unnatural.

Further evidence to support this is that he often plays bad moves. That is, moves that are considered blunders, with a high frequency. This is either an attempt to cover up the engine moves or representative of his actual capability. For instance, the report mentions that in a post-game analysis he suggested a move that would be an obvious blunder. When the interviewer pointed this out, Hans wasn't fully convinced until he was shown the engine analysis. So he also is showing a habit of deferring to what the engine suggests.