Hacker News new | ask | show | jobs
by dontreact 1653 days ago
The name of the game here is generality. For a really general agent, they are looking to have superhuman performance, not get state of the art on every individual task. Beating stockfish 8 convinces me that it would be superhuman at chess.
1 comments

They could still be honest that it's Stockfish 8, not the Stockfish everyone has. Your product having genuine value does not excuse lying about that value.
I observed this kind of behavior in many papers nowadays. This extremely painful for research, because some better candidates could be overseen and FAANG publishs a majority in the ML-paper section. Its a mess.
They were? They say they use Stockfish 8 the very first time they mention it.
First time they mention it is page 10:

> one of the strongest and most widely-used programs is Stockfish [81].

Here's the citation, note the date:

> [81] The Stockfish Development Team. Stockfish: Open source chess engine, 2021. https://stockfishchess.org/.

They mention the version number only once, further down, and don't point out that it's out of date since February 2018. All other 11 mentions of it don't have the version number, like in that sentence:

> In Chess, PoG(60000,10) is stronger than Stockfish using 4 threads and one second of search time.

>First time they mention it is page 10:

Yeah, so it is! I guess I ran into the same weirdness as ShamelessC, since when I first Ctrl-F:ed the PDF, hit 1/11 was on page 11. Now that I try my damndest to reproduce it, I get 12 hits and the first is that one on page 10.

Yup, "In chess, we evaluated PoG against Stockfish 8, level 20 [81] and AlphaZero."